wanghaofan commited on
Commit
d105239
·
verified ·
1 Parent(s): 0ace613

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +166 -5
README.md CHANGED
@@ -1,5 +1,166 @@
1
- ---
2
- license: other
3
- license_name: flux-1-dev-non-commercial-license
4
- license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: flux-1-dev-non-commercial-license
4
+ license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md
5
+
6
+ language:
7
+ - en
8
+ library_name: diffusers
9
+ pipeline_tag: text-to-image
10
+
11
+ tags:
12
+ - Text-to-Image
13
+ - ControlNet
14
+ - Diffusers
15
+ - Flux.1-dev
16
+ - image-generation
17
+ - Stable Diffusion
18
+ base_model: black-forest-labs/FLUX.1-dev
19
+ ---
20
+
21
+ ## RepText
22
+
23
+ We present RepText, which aims to empower pre-trained monolingual text-to-image generation models with the ability to accurately render, or more precisely, replicate, multilingual visual text in user-specified fonts, without the need to really understand them. Specifically, we adopt the setting from ControlNet and additionally integrate language agnostic glyph and position of rendered text to enable generating harmonized visual text, allowing users to customize text content, font and position on their needs. To improve accuracy, a text perceptual loss is employed along with the diffusion loss. Furthermore, to stabilize rendering process, at the inference phase, we directly initialize with noisy glyph latent instead of random initialization, and adopt region masks to restrict the feature injection to only the text region to avoid distortion of the background. We conducted extensive experiments to verify the effectiveness of our RepText relative to existing works, our approach outperforms existing open-source methods and achieves comparable results to native multi-language closed-source models.
24
+
25
+ <div align="center">
26
+ <img src='assets/example1.png' width=1024>
27
+ </div>
28
+
29
+ ## ⭐ Update
30
+ - [2025/06/07] [Model Weights](https://huggingface.co/Shakker-Labs/RepText) and inference code released!
31
+ - [2025/04/28] [Technical Report](https://arxiv.org/abs/2504.19724) released!
32
+
33
+ ## Usage
34
+ Please refer to [Github](https://github.com/Shakker-Labs/RepText) for details.
35
+
36
+ ```python
37
+ import torch
38
+ from controlnet_flux import FluxControlNetModel
39
+ from pipeline_flux_controlnet import FluxControlNetPipeline
40
+
41
+ from PIL import Image, ImageDraw, ImageFont
42
+ import numpy as np
43
+ import cv2
44
+ import re
45
+ import os
46
+
47
+ def contains_chinese(text):
48
+ if re.search(r'[\u4e00-\u9fff]', text):
49
+ return True
50
+ return False
51
+
52
+ def canny(img):
53
+ low_threshold = 50
54
+ high_threshold = 100
55
+ img = cv2.Canny(img, low_threshold, high_threshold)
56
+ img = img[:, :, None]
57
+ img = 255 - np.concatenate([img, img, img], axis=2)
58
+ return img
59
+
60
+ base_model = "black-forest-labs/FLUX.1-dev"
61
+ controlnet_model = "Shakker-Labs/RepText"
62
+
63
+ controlnet = FluxControlNetModel.from_pretrained(controlnet_model, torch_dtype=torch.bfloat16)
64
+ pipe = FluxControlNetPipeline.from_pretrained(
65
+ base_model, controlnet=controlnet, torch_dtype=torch.bfloat16
66
+ ).to("cuda")
67
+
68
+ ## set resolution
69
+ width, height = 1024, 1024
70
+
71
+ ## set font
72
+ font_path = "./assets/Arial_Unicode.ttf" # use your own font
73
+ font_size = 80 # it is recommended to use a font size >= 60
74
+ font = ImageFont.truetype(font_path, font_size)
75
+
76
+ ## set text content, position, color
77
+ text_list = ["哩布哩布"]
78
+ text_position_list = [(370, 200)]
79
+ text_color_list = [(255, 255, 255)]
80
+
81
+ ## set controlnet conditions
82
+ control_image_list = [] # canny list
83
+ control_position_list = [] # position list
84
+ control_mask_list = [] # regional mask list
85
+ control_glyph_all = np.zeros([height, width, 3], dtype=np.uint8) # all glyphs
86
+
87
+ ## handle each line of text
88
+ for text, text_position, text_color in zip(text_list, text_position_list, text_color_list):
89
+
90
+ ### glyph image, render text to black background
91
+ control_image_glyph = Image.new("RGB", (width, height), (0, 0, 0))
92
+ draw = ImageDraw.Draw(control_image_glyph)
93
+ draw.text(text_position, text, font=font, fill=text_color)
94
+
95
+ ### get bbox
96
+ bbox = draw.textbbox(text_position, text, font=font)
97
+
98
+ ### position condition
99
+ control_position = np.zeros([height, width], dtype=np.uint8)
100
+ control_position[bbox[1]:bbox[3], bbox[0]:bbox[2]] = 255
101
+ control_position = Image.fromarray(control_position.astype(np.uint8))
102
+ control_position_list.append(control_position)
103
+
104
+ ### regional mask
105
+ control_mask_np = np.zeros([height, width], dtype=np.uint8)
106
+ control_mask_np[bbox[1]-5:bbox[3]+5, bbox[0]-5:bbox[2]+5] = 255
107
+ control_mask = Image.fromarray(control_mask_np.astype(np.uint8))
108
+ control_mask_list.append(control_mask)
109
+
110
+ ### accumulate glyph
111
+ control_glyph = np.array(control_image_glyph)
112
+ control_glyph_all += control_glyph
113
+
114
+ ### canny condition
115
+ control_image = canny(cv2.cvtColor(np.array(control_image_glyph), cv2.COLOR_RGB2BGR))
116
+ control_image = Image.fromarray(cv2.cvtColor(control_image, cv2.COLOR_BGR2RGB))
117
+ control_image_list.append(control_image)
118
+
119
+ control_glyph_all = Image.fromarray(control_glyph_all.astype(np.uint8))
120
+ control_glyph_all = control_glyph_all.convert("RGB")
121
+ # control_glyph_all.save("./results/control_glyph.jpg")
122
+
123
+ # it is recommended to use words such 'sign', 'billboard', 'banner' in your prompt
124
+ # for Englith text, it helps if you add the text to the prompt
125
+ prompt = "a street sign in city"
126
+ for text in text_list:
127
+ if not contains_chinese(text):
128
+ prompt += f", '{text}'"
129
+ prompt += ", filmfotos, film grain, reversal film photography" # optional
130
+ print(prompt)
131
+
132
+ generator = torch.Generator(device="cuda").manual_seed(42)
133
+
134
+ image = pipe(
135
+ prompt,
136
+ control_image=control_image_list, # canny
137
+ control_position=control_position_list, # position
138
+ control_mask=control_mask_list, # regional mask
139
+ control_glyph=control_glyph_all, # as init latent, optional, set to None if not used
140
+ controlnet_conditioning_scale=1.0,
141
+ controlnet_conditioning_step=30,
142
+ width=width,
143
+ height=height,
144
+ num_inference_steps=30,
145
+ guidance_scale=3.5,
146
+ generator=generator,
147
+ ).images[0]
148
+
149
+ if not os.path.exists("./results"):
150
+ os.makedirs("./results")
151
+ image.save(f"./results/result.jpg")
152
+ ```
153
+
154
+ ## 📑 Citation
155
+ If you find RepText useful for your research and applications, please cite us using this BibTeX:
156
+ ```bibtex
157
+ @article{wang2025reptext,
158
+ title={RepText: Rendering Visual Text via Replicating},
159
+ author={Wang, Haofan and Xu, Yujia and Li, Yimeng and Li, Junchen and Zhang, Chaowei and Wang, Jing and Yang, Kejia and Chen, Zhibo},
160
+ journal={arXiv preprint arXiv:2504.19724},
161
+ year={2025}
162
+ }
163
+ ```
164
+
165
+ ## 📧 Contact
166
+ If you have any questions, please feel free to reach us at `[email protected]`.