Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,6 @@ license_name: nvidia-open-model-license
|
|
4 |
license_link: https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
|
5 |
---
|
6 |
|
7 |
-
|
8 |
# Model Overview
|
9 |
|
10 |
## Description
|
@@ -68,6 +67,41 @@ Huggingface: 03/26/2025 via [RADIO Collection of Models](https://huggingface.co/
|
|
68 |
**Output Parameters:** 2D <br>
|
69 |
**Other Properties Related to Output:** Downstream model required to leverage image features <br>
|
70 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
71 |
## Software Integration
|
72 |
|
73 |
**Runtime Engine(s):**
|
@@ -192,4 +226,3 @@ Model Application(s): | Generation of visual embe
|
|
192 |
Describe the life critical impact (if present). | Not Applicable
|
193 |
Use Case Restrictions: | Abide by NVIDIA Open Model License Agreement
|
194 |
Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.
|
195 |
-
|
|
|
4 |
license_link: https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
|
5 |
---
|
6 |
|
|
|
7 |
# Model Overview
|
8 |
|
9 |
## Description
|
|
|
67 |
**Output Parameters:** 2D <br>
|
68 |
**Other Properties Related to Output:** Downstream model required to leverage image features <br>
|
69 |
|
70 |
+
## Usage:
|
71 |
+
|
72 |
+
RADIO will return a tuple with two tensors.
|
73 |
+
The `summary` is similar to the `cls_token` in ViT and is meant to represent the general concept of the entire image.
|
74 |
+
It has shape `(B,C)` with `B` being the batch dimension, and `C` being some number of channels.
|
75 |
+
The `spatial_features` represent more localized content which should be suitable for dense tasks such as semantic segmentation, or for integration into an LLM.
|
76 |
+
|
77 |
+
```python
|
78 |
+
import torch
|
79 |
+
from PIL import Image
|
80 |
+
from transformers import AutoModel, CLIPImageProcessor
|
81 |
+
|
82 |
+
hf_repo = "nvidia/C-RADIOv2-B"
|
83 |
+
|
84 |
+
image_processor = CLIPImageProcessor.from_pretrained(hf_repo)
|
85 |
+
model = AutoModel.from_pretrained(hf_repo, trust_remote_code=True)
|
86 |
+
model.eval().cuda()
|
87 |
+
|
88 |
+
image = Image.open('./assets/radio.png').convert('RGB')
|
89 |
+
pixel_values = image_processor(images=image, return_tensors='pt', do_resize=True).pixel_values
|
90 |
+
pixel_values = pixel_values.cuda()
|
91 |
+
|
92 |
+
summary, features = model(pixel_values)
|
93 |
+
```
|
94 |
+
|
95 |
+
Spatial features have shape `(B,T,D)` with `T` being the flattened spatial tokens, and `D` being the channels for spatial features. Note that `C!=D` in general.
|
96 |
+
Converting to a spatial tensor format can be done using the downsampling size of the model, combined with the input tensor shape. For RADIO, the patch size is 16.
|
97 |
+
|
98 |
+
```Python
|
99 |
+
from einops import rearrange
|
100 |
+
spatial_features = rearrange(spatial_features, 'b (h w) d -> b d h w', h=x.shape[-2] // patch_size, w=x.shape[-1] // patch_size)
|
101 |
+
```
|
102 |
+
|
103 |
+
The resulting tensor will have shape `(B,D,H,W)`, as is typically seen with computer vision models.
|
104 |
+
|
105 |
## Software Integration
|
106 |
|
107 |
**Runtime Engine(s):**
|
|
|
226 |
Describe the life critical impact (if present). | Not Applicable
|
227 |
Use Case Restrictions: | Abide by NVIDIA Open Model License Agreement
|
228 |
Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.
|
|