aoxo
/

Image-to-Image
English
art
aoxo commited on
Commit
3f5814e
·
verified ·
1 Parent(s): b1630fd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -119,10 +119,12 @@ Here's a more concise version of your original paragraph, maintaining the essent
119
  ---
120
 
121
  **Preprocessing of Large-Scale Image Data for Photorealism Enhancement**
122
- This section details our methodology for preprocessing a large-scale dataset of approximately 117 million game-rendered frames from 9 AAA video games and 1.24 billion real-world images from Mapillary Vistas and Cityscapes, all in 4K resolution. The goal is to pair game frames with real images that exhibit the highest cosine similarity based on structural and visual features, ensuring alignment of fine details like object positions.
123
 
124
  Images and their corresponding style semantic maps were resized to **512 x 512** pixels and corrected to a **24-bit** depth (3 channels) if they exceeded this depth. We employ a novel **feature-mapped channel-split PSNR matching** approach using **EfficientNet** feature extraction, channel splitting, and dual metric computation of PSNR and cosine similarity. **Locality-Sensitive Hashing** (LSH) aids in efficiently identifying the **top-10 nearest neighbors** for each frame. This resulted in a massive dataset of **1.17** billion frame-image pairs and **12.4 billion** image-frame pairs. The final selection process involves assessing similarity consistency across channels to ensure accurate pairings. This scalable preprocessing pipeline enables efficient pairing while preserving critical visual details, laying the foundation for subsequent **contrastive learning** to enhance **photorealism in game-rendered frames**.
125
 
 
 
126
  #### Training Hyperparameters
127
 
128
  **v1**
 
119
  ---
120
 
121
  **Preprocessing of Large-Scale Image Data for Photorealism Enhancement**
122
+ This section details our methodology for preprocessing a large-scale dataset of approximately **117 million game-rendered frames** from **9 AAA video games** and **1.24 billion real-world images** from Mapillary Vistas and Cityscapes, all in 4K resolution. The goal is to pair game frames with real images that exhibit the highest cosine similarity based on structural and visual features, ensuring alignment of fine details like object positions, level of detail and motion blur.
123
 
124
  Images and their corresponding style semantic maps were resized to **512 x 512** pixels and corrected to a **24-bit** depth (3 channels) if they exceeded this depth. We employ a novel **feature-mapped channel-split PSNR matching** approach using **EfficientNet** feature extraction, channel splitting, and dual metric computation of PSNR and cosine similarity. **Locality-Sensitive Hashing** (LSH) aids in efficiently identifying the **top-10 nearest neighbors** for each frame. This resulted in a massive dataset of **1.17** billion frame-image pairs and **12.4 billion** image-frame pairs. The final selection process involves assessing similarity consistency across channels to ensure accurate pairings. This scalable preprocessing pipeline enables efficient pairing while preserving critical visual details, laying the foundation for subsequent **contrastive learning** to enhance **photorealism in game-rendered frames**.
125
 
126
+ ![preprocessing](preprocessing.png)
127
+
128
  #### Training Hyperparameters
129
 
130
  **v1**