aoxo
/

Image-to-Image
English
art
aoxo commited on
Commit
cf42685
·
verified ·
1 Parent(s): c4b9abd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -101
README.md CHANGED
@@ -140,10 +140,75 @@ Images and their corresponding style semantic maps were resized to fit the input
140
  - v1_3: 93M params
141
  - v2_1: 2.9M params
142
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
143
  **Architecture:** The latest model, v2_1, introduces Location-based Multi-head Attention (LbMhA) to improve feature extraction at lower parameters. The three other predecessors attained a similar level of accuracy without the LbMhA layers. The general architecture is as follows:
144
 
145
  ```python
146
- 223543305
147
  DataParallel(
148
  (module): ViTImage2Image(
149
  (patch_embed): Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16))
@@ -242,115 +307,25 @@ DataParallel(
242
  )
243
  ```
244
 
245
- **Training hardware:** Each of the models were trained on 2 x T4 GPUs (multi-GPU training). For this reason, linear attention modules were implemented as ring (distributed) attention during training.
246
- **Total Training Compute Throughput:** 4.13 TFLOPS
247
- **Total Logged Training Time:** ~210 hours (total time split across four models including overhead)
248
- **Start Time:** 09-13-2024
249
- **End Time:** 09-21-2024
250
- **Checkpoint Size:**
251
- - v1_1: 855 MB
252
- - v1_2: 764 MB
253
- - v1_3: 355 MB
254
- - v2_2: 11 MB
255
-
256
- ## Evaluation Data, Metrics & Results
257
-
258
- This section covers information on how the model was evaluated at each stage.
259
-
260
- ### Evaluation Data
261
-
262
- Evaluation was performed on real-time footage captured from Grand Theft Auto V, Cyberpunk 2077 and WatchDogs 2.
263
-
264
- ### Metrics
265
-
266
- - PSNR (Peak Signal-to-Noise Ratio)
267
- - Combined loss (L1 loss + Total Variation loss)
268
-
269
- ### Results
270
-
271
- - In-game ![ingame-car](ingame-car.jpg)
272
-
273
- - Ours ![ours-car](ours-car.jpg)
274
-
275
- - In-game ![ingame-car2](ingame-car2.png)
276
-
277
- - Ours ![ours-car2](ours-car2.png)
278
-
279
- - In-game ![ingame-roads](ingame-roads.png)
280
-
281
- - Ours ![ours-roads](ours-roads.png)
282
-
283
- - In-game ![ingame-roads2](ingame-roads2.png)
284
-
285
- - Ours ![ours-roads2](ours-roads2.png)
286
-
287
-
288
- #### Summary
289
-
290
-
291
-
292
- ## Model Examination [optional]
293
-
294
- <!-- Relevant interpretability work for the model goes here -->
295
-
296
- [More Information Needed]
297
-
298
- ## Environmental Impact
299
-
300
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
301
-
302
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
303
-
304
- - **Hardware Type:** [More Information Needed]
305
- - **Hours used:** [More Information Needed]
306
- - **Cloud Provider:** [More Information Needed]
307
- - **Compute Region:** [More Information Needed]
308
- - **Carbon Emitted:** [More Information Needed]
309
-
310
- ## Technical Specifications [optional]
311
-
312
- ### Model Architecture and Objective
313
-
314
- [More Information Needed]
315
-
316
  ### Compute Infrastructure
317
 
318
- [More Information Needed]
319
-
320
  #### Hardware
321
 
322
- [More Information Needed]
323
 
324
  #### Software
325
 
326
- [More Information Needed]
327
-
328
- ## Citation [optional]
329
-
330
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
331
-
332
- **BibTeX:**
333
-
334
- [More Information Needed]
335
-
336
- **APA:**
337
-
338
- [More Information Needed]
339
-
340
- ## Glossary [optional]
341
-
342
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
343
-
344
- [More Information Needed]
345
-
346
- ## More Information [optional]
347
-
348
- [More Information Needed]
349
 
350
- ## Model Card Authors [optional]
351
 
352
- [More Information Needed]
353
 
354
  ## Model Card Contact
355
 
356
- [More Information Needed]
 
140
  - v1_3: 93M params
141
  - v2_1: 2.9M params
142
 
143
+ **Training hardware:** Each of the models were trained on 2 x T4 GPUs (multi-GPU training). For this reason, linear attention modules were implemented as ring (distributed) attention during training.
144
+
145
+ **Total Training Compute Throughput:** 4.13 TFLOPS
146
+
147
+ **Total Logged Training Time:** ~210 hours (total time split across four models including overhead)
148
+
149
+ **Start Time:** 09-13-2024
150
+
151
+ **End Time:** 09-21-2024
152
+
153
+ **Checkpoint Size:**
154
+ - v1_1: 855 MB
155
+ - v1_2: 764 MB
156
+ - v1_3: 355 MB
157
+ - v2_2: 11 MB
158
+
159
+ ## Evaluation Data, Metrics & Results
160
+
161
+ This section covers information on how the model was evaluated at each stage.
162
+
163
+ ### Evaluation Data
164
+
165
+ Evaluation was performed on real-time footage captured from Grand Theft Auto V, Cyberpunk 2077 and WatchDogs 2.
166
+
167
+ ### Metrics
168
+
169
+ - PSNR (Peak Signal-to-Noise Ratio)
170
+ - Combined loss (L1 loss + Total Variation loss)
171
+
172
+ ### Results
173
+
174
+ - In-game ![ingame-car](ingame-car.jpg)
175
+
176
+ - Ours ![ours-car](ours-car.jpg)
177
+
178
+ - In-game ![ingame-car2](ingame-car2.png)
179
+
180
+ - Ours ![ours-car2](ours-car2.png)
181
+
182
+ - In-game ![ingame-roads](ingame-roads.png)
183
+
184
+ - Ours ![ours-roads](ours-roads.png)
185
+
186
+ - In-game ![ingame-roads2](ingame-roads2.png)
187
+
188
+ - Ours ![ours-roads2](ours-roads2.png)
189
+
190
+ ## Environmental Impact
191
+
192
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
193
+
194
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
195
+
196
+ - **Hardware Type:** 2 x Nvidia T4 16GB GPUs
197
+ - **Hours used:** 210 (per GPU); 420 (combined)
198
+ - **Cloud Provider:** Kaggle
199
+ - **Compute Region:** US
200
+ - **Carbon Emitted:** 8.82 kg CO2
201
+
202
+ ## Technical Specifications
203
+
204
+ ### Model Architecture and Objective
205
+
206
+ RealFormer is a Transformer-based low-latency Style Transfer Generative LM that attempts to reconstruct each frame into a more photorealistic image.
207
+ The objective of RealFormer is to attain the maximum level of detail to the real-world, which even current video games with exhaustive graphics are not able to.
208
+
209
  **Architecture:** The latest model, v2_1, introduces Location-based Multi-head Attention (LbMhA) to improve feature extraction at lower parameters. The three other predecessors attained a similar level of accuracy without the LbMhA layers. The general architecture is as follows:
210
 
211
  ```python
 
212
  DataParallel(
213
  (module): ViTImage2Image(
214
  (patch_embed): Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16))
 
307
  )
308
  ```
309
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
310
  ### Compute Infrastructure
311
 
 
 
312
  #### Hardware
313
 
314
+ 2 x Nvidia T4 16GB GPUs
315
 
316
  #### Software
317
 
318
+ - PyTorch
319
+ - torchvision
320
+ - einops
321
+ - numpy
322
+ - PIL (Python Imaging Library)
323
+ - matplotlib (for visualization)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
324
 
325
+ ## Model Card Authors
326
 
327
+ Alosh Denny
328
 
329
  ## Model Card Contact
330
 
331