aoxo
/

RealFormer

Image-to-Image

English

art

Model card Files Files and versions Community

aoxo commited on Oct 6, 2024

Commit

3f5814e

verified ·

1 Parent(s): b1630fd

Update README.md

Browse files

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -119,10 +119,12 @@ Here's a more concise version of your original paragraph, maintaining the essent
 ---
 **Preprocessing of Large-Scale Image Data for Photorealism Enhancement**
-This section details our methodology for preprocessing a large-scale dataset of approximately 117 million game-rendered frames from 9 AAA video games and 1.24 billion real-world images from Mapillary Vistas and Cityscapes, all in 4K resolution. The goal is to pair game frames with real images that exhibit the highest cosine similarity based on structural and visual features, ensuring alignment of fine details like object positions.
 Images and their corresponding style semantic maps were resized to **512 x 512** pixels and corrected to a **24-bit** depth (3 channels) if they exceeded this depth. We employ a novel **feature-mapped channel-split PSNR matching** approach using **EfficientNet** feature extraction, channel splitting, and dual metric computation of PSNR and cosine similarity. **Locality-Sensitive Hashing** (LSH) aids in efficiently identifying the **top-10 nearest neighbors** for each frame. This resulted in a massive dataset of **1.17** billion frame-image pairs and **12.4 billion** image-frame pairs. The final selection process involves assessing similarity consistency across channels to ensure accurate pairings. This scalable preprocessing pipeline enables efficient pairing while preserving critical visual details, laying the foundation for subsequent **contrastive learning** to enhance **photorealism in game-rendered frames**.
 #### Training Hyperparameters
 **v1**

 ---
 **Preprocessing of Large-Scale Image Data for Photorealism Enhancement**
+This section details our methodology for preprocessing a large-scale dataset of approximately **117 million game-rendered frames** from **9 AAA video games** and **1.24 billion real-world images** from Mapillary Vistas and Cityscapes, all in 4K resolution. The goal is to pair game frames with real images that exhibit the highest cosine similarity based on structural and visual features, ensuring alignment of fine details like object positions, level of detail and motion blur.
 Images and their corresponding style semantic maps were resized to **512 x 512** pixels and corrected to a **24-bit** depth (3 channels) if they exceeded this depth. We employ a novel **feature-mapped channel-split PSNR matching** approach using **EfficientNet** feature extraction, channel splitting, and dual metric computation of PSNR and cosine similarity. **Locality-Sensitive Hashing** (LSH) aids in efficiently identifying the **top-10 nearest neighbors** for each frame. This resulted in a massive dataset of **1.17** billion frame-image pairs and **12.4 billion** image-frame pairs. The final selection process involves assessing similarity consistency across channels to ensure accurate pairings. This scalable preprocessing pipeline enables efficient pairing while preserving critical visual details, laying the foundation for subsequent **contrastive learning** to enhance **photorealism in game-rendered frames**.
+![preprocessing](preprocessing.png)
 #### Training Hyperparameters
 **v1**