Upload 9 files

Browse files

Files changed (10) hide show

.gitattributes +1 -0
README.md +165 -0
config.json +111 -0
demo.png +3 -0
model.safetensors +3 -0
onnx/model.onnx +3 -0
onnx/model_quantized.onnx +3 -0
preprocessor_config.json +23 -0
pytorch_model.bin +3 -0
quantize_config.json +33 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+demo.png filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,165 @@

+---
+language: en
+library_name: transformers
+tags:
+  - vision
+  - image-segmentation
+  - nvidia/mit-b5
+  - transformers.js
+  - onnx
+datasets:
+  - celebamaskhq
+---
+# Face Parsing
+![example image and output](demo.png)
+[Semantic segmentation](https://huggingface.co/docs/transformers/tasks/semantic_segmentation) model fine-tuned from [nvidia/mit-b5](https://huggingface.co/nvidia/mit-b5) with [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) for face parsing. For additional options, see the Transformers [Segformer docs](https://huggingface.co/docs/transformers/model_doc/segformer).
+> ONNX model for web inference contributed by [Xenova](https://huggingface.co/Xenova).
+## Usage in Python
+Exhaustive list of labels can be extracted from [config.json](https://huggingface.co/jonathandinu/face-parsing/blob/65972ac96180b397f86fda0980bbe68e6ee01b8f/config.json#L30).
+| id  | label      | note              |
+| :-: | :--------- | :---------------- |
+|  0  | background |                   |
+|  1  | skin       |                   |
+|  2  | nose       |                   |
+|  3  | eye_g      | eyeglasses        |
+|  4  | l_eye      | left eye          |
+|  5  | r_eye      | right eye         |
+|  6  | l_brow     | left eyebrow      |
+|  7  | r_brow     | right eyebrow     |
+|  8  | l_ear      | left ear          |
+|  9  | r_ear      | right ear         |
+| 10  | mouth      | area between lips |
+| 11  | u_lip      | upper lip         |
+| 12  | l_lip      | lower lip         |
+| 13  | hair       |                   |
+| 14  | hat        |                   |
+| 15  | ear_r      | earring           |
+| 16  | neck_l     | necklace          |
+| 17  | neck       |                   |
+| 18  | cloth      | clothing          |
+```python
+import torch
+from torch import nn
+from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
+from PIL import Image
+import matplotlib.pyplot as plt
+import requests
+# convenience expression for automatically determining device
+device = (
+    "cuda"
+    # Device for NVIDIA or AMD GPUs
+    if torch.cuda.is_available()
+    else "mps"
+    # Device for Apple Silicon (Metal Performance Shaders)
+    if torch.backends.mps.is_available()
+    else "cpu"
+)
+# load models
+image_processor = SegformerImageProcessor.from_pretrained("jonathandinu/face-parsing")
+model = SegformerForSemanticSegmentation.from_pretrained("jonathandinu/face-parsing")
+model.to(device)
+# expects a PIL.Image or torch.Tensor
+url = "https://images.unsplash.com/photo-1539571696357-5a69c17a67c6"
+image = Image.open(requests.get(url, stream=True).raw)
+# run inference on image
+inputs = image_processor(images=image, return_tensors="pt").to(device)
+outputs = model(**inputs)
+logits = outputs.logits  # shape (batch_size, num_labels, ~height/4, ~width/4)
+# resize output to match input image dimensions
+upsampled_logits = nn.functional.interpolate(logits,
+                size=image.size[::-1], # H x W
+                mode='bilinear',
+                align_corners=False)
+# get label masks
+labels = upsampled_logits.argmax(dim=1)[0]
+# move to CPU to visualize in matplotlib
+labels_viz = labels.cpu().numpy()
+plt.imshow(labels_viz)
+plt.show()
+```
+## Usage in the browser (Transformers.js)
+```js
+import {
+  pipeline,
+  env,
+} from "https://cdn.jsdelivr.net/npm/@xenova/[email protected]";
+// important to prevent errors since the model files are likely remote on HF hub
+env.allowLocalModels = false;
+// instantiate image segmentation pipeline with pretrained face parsing model
+model = await pipeline("image-segmentation", "jonathandinu/face-parsing");
+// async inference since it could take a few seconds
+const output = await model(url);
+// each label is a separate mask object
+// [
+//   { score: null, label: 'background', mask: transformers.js RawImage { ... }}
+//   { score: null, label: 'hair', mask: transformers.js RawImage { ... }}
+//    ...
+// ]
+for (const m of output) {
+  print(`Found ${m.label}`);
+  m.mask.save(`${m.label}.png`);
+}
+```
+### p5.js
+Since [p5.js](https://p5js.org/) uses an animation loop abstraction, we need to take care loading the model and making predictions.
+```js
+// ...
+// asynchronously load transformers.js and instantiate model
+async function preload() {
+  // load transformers.js library with a dynamic import
+  const { pipeline, env } = await import(
+    "https://cdn.jsdelivr.net/npm/@xenova/[email protected]"
+  );
+  // important to prevent errors since the model files are remote on HF hub
+  env.allowLocalModels = false;
+  // instantiate image segmentation pipeline with pretrained face parsing model
+  model = await pipeline("image-segmentation", "jonathandinu/face-parsing");
+  print("face-parsing model loaded");
+}
+// ...
+```
+[full p5.js example](https://editor.p5js.org/jonathan.ai/sketches/wZn15Dvgh)
+### Model Description
+- **Developed by:** [Jonathan Dinu](https://twitter.com/jonathandinu)
+- **Model type:** Transformer-based semantic segmentation image model
+- **License:** non-commercial research and educational purposes
+- **Resources for more information:** Transformers docs on [Segformer](https://huggingface.co/docs/transformers/model_doc/segformer) and/or the [original research paper](https://arxiv.org/abs/2105.15203).
+## Limitations and Bias
+### Bias
+While the capabilities of computer vision models are impressive, they can also reinforce or exacerbate social biases. The [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) dataset used for fine-tuning is large but not necessarily perfectly diverse or representative. Also, they are images of.... just celebrities.

config.json ADDED Viewed

	@@ -0,0 +1,111 @@

+{
+  "_name_or_path": "jonathandinu/face-parsing",
+  "architectures": [
+    "SegformerForSemanticSegmentation"
+  ],
+  "attention_probs_dropout_prob": 0.0,
+  "classifier_dropout_prob": 0.1,
+  "decoder_hidden_size": 768,
+  "depths": [
+    3,
+    6,
+    40,
+    3
+  ],
+  "downsampling_rates": [
+    1,
+    4,
+    8,
+    16
+  ],
+  "drop_path_rate": 0.1,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.0,
+  "hidden_sizes": [
+    64,
+    128,
+    320,
+    512
+  ],
+  "id2label": {
+    "0": "background",
+    "1": "skin",
+    "2": "nose",
+    "3": "eye_g",
+    "4": "l_eye",
+    "5": "r_eye",
+    "6": "l_brow",
+    "7": "r_brow",
+    "8": "l_ear",
+    "9": "r_ear",
+    "10": "mouth",
+    "11": "u_lip",
+    "12": "l_lip",
+    "13": "hair",
+    "14": "hat",
+    "15": "ear_r",
+    "16": "neck_l",
+    "17": "neck",
+    "18": "cloth"
+  },
+  "image_size": 224,
+  "initializer_range": 0.02,
+  "label2id": {
+    "background": 0,
+    "skin": 1,
+    "nose": 2,
+    "eye_g": 3,
+    "l_eye": 4,
+    "r_eye": 5,
+    "l_brow": 6,
+    "r_brow": 7,
+    "l_ear": 8,
+    "r_ear": 9,
+    "mouth": 10,
+    "u_lip": 11,
+    "l_lip": 12,
+    "hair": 13,
+    "hat": 14,
+    "ear_r": 15,
+    "neck_l": 16,
+    "neck": 17,
+    "cloth": 18
+  },
+  "layer_norm_eps": 1e-06,
+  "mlp_ratios": [
+    4,
+    4,
+    4,
+    4
+  ],
+  "model_type": "segformer",
+  "num_attention_heads": [
+    1,
+    2,
+    5,
+    8
+  ],
+  "num_channels": 3,
+  "num_encoder_blocks": 4,
+  "patch_sizes": [
+    7,
+    3,
+    3,
+    3
+  ],
+  "reshape_last_stage": true,
+  "semantic_loss_ignore_index": 255,
+  "sr_ratios": [
+    8,
+    4,
+    2,
+    1
+  ],
+  "strides": [
+    4,
+    2,
+    2,
+    2
+  ],
+  "transformers_version": "4.37.0.dev0"
+}

demo.png ADDED Viewed

Git LFS Details

SHA256: 31c74d29ab9e45f3401f404f7bfc09e2cf9f5825611f07dc20b25d00eb1cac8a
Pointer size: 131 Bytes
Size of remote file: 645 kB

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c2bec795a8c243db71bd95be538fd62559003566466c71237e45c99b920f4b62
+size 338580732

onnx/model.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6d4e67af60ff78184745ebf74cc15163c0adc27d45cdeba31e3a03d1096fb8c3
+size 340316611

onnx/model_quantized.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5bab9bfb3cb979f3098ac3b934b1641dbf87f835e0b03c2ca6d88dcf18c83d27
+size 89439678

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "do_normalize": true,
+  "do_reduce_labels": false,
+  "do_rescale": true,
+  "do_resize": true,
+  "image_mean": [
+    0.485,
+    0.456,
+    0.406
+  ],
+  "image_processor_type": "SegformerFeatureExtractor",
+  "image_std": [
+    0.229,
+    0.224,
+    0.225
+  ],
+  "resample": 2,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "height": 512,
+    "width": 512
+  }
+}

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e0139f52e953a00ca01d86faf7363f067a535291a003c096dd9c56b09d8945f1
+size 338821701

quantize_config.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+    "per_channel": true,
+    "reduce_range": true,
+    "per_model_config": {
+        "model": {
+            "op_types": [
+                "Unsqueeze",
+                "Shape",
+                "Transpose",
+                "Sqrt",
+                "Gather",
+                "Slice",
+                "Erf",
+                "Div",
+                "Reshape",
+                "Add",
+                "Cast",
+                "Sub",
+                "Concat",
+                "ReduceMean",
+                "Mul",
+                "Conv",
+                "Constant",
+                "Resize",
+                "Softmax",
+                "Pow",
+                "Relu",
+                "MatMul"
+            ],
+            "weight_type": "QUInt8"
+        }
+    }
+}