Bokeh 3.5 Medium

Bokeh 3.5 Medium is a Continue-training model built upon the stable diffusion 3.5 medium foundation, further refined using a 500W high-resolution open-source dataset with rigorous aesthetic curation. This ensures outstanding image quality, fine detail preservation, and enhanced controllability.
This model is released under the Stability Community License. For more details, visit Tensor.Art or TusiArt to explore additional resources and useful information.
Overview
- Continue-training on SD3.5M, leveraging a large-scale 500W high-resolution dataset, carefully curated for aesthetic quality.
- Supports hybrid short/long caption training for enhanced natural language understanding.
- Short Captions: Focus on core image features.
- Long Captions: Provide broader scene context and atmospheric details.
- Recommended Resolutions:
1920x1024
,1728x1152
,1152x1728
,1280x1664
,1440x1440
- Best Quality Training Resolution:
1440x1440
- Supports LoRA fine-tuning.
Advantages
🖼️ High-Quality Image Generation
- State-of-the-art visual fidelity with improved detail extraction and aesthetic consistency.
- Enhanced resolution support up to 200W pixels, ensuring highly detailed image outputs.
- Carefully curated dataset ensures better composition, lighting, and overall artistic appeal.
🎯 Powerful Custom Fine-Tuning
- Exceptional LoRA training support, making it highly effective for:
- Photography
- 3D Rendering
- Illustration
- Concept Art
⚡ Efficient Inference & Training
- Low hardware requirements for inference:
- Medium model: 9GB VRAM (without T5)
- Full weights inference: 16GB VRAM (suitable for local deployment)
- LoRA fine-tuning VRAM requirement: 12GB - 32GB
Known Issues
- Potential human anatomy inconsistencies.
- Limited ability to generate photorealistic images.
- Some concepts may suffer from aesthetic quality issues.
Prompting Guide
Use a structured prompt combining:
- Main subject (e.g.,
"Close-up of a macaw"
) - Detailed features (e.g.,
"vivid feathers, sharp beak"
) - Background environment (e.g.,
"dimly lit environment"
) - Atmospheric description (e.g.,
"soft warm lighting, cinematic mood"
)
Best Practices:
- Avoid overly complex prompts, as the model already has strong text encoding. Overloading details can cause T5 hallucination artifacts, reducing image quality.
- Do not use excessively short prompts (e.g., single words or 2-3 tokens) unless combined with LoRA or Image2Image (i2i) techniques.
- Avoid mixing too many unrelated concepts, as this can lead to visual distortions and unwanted artifacts.
- Optimal token length: 30-70 tokens.
Negative Prompting
- Negative prompts strongly influence image quality.
- Ensure they do not contradict the main subject to avoid degrading the output.
Example Output
Using diffusers:
import torch
from diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_pretrained("/mnt/share/pcm_outputs/bokeh_3.5_medium", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")
image = pipe(
"Close-up of a macaw, dimly lit environment",
num_inference_steps=28,
guidance_scale=4,
height=1920,
width=1024,
).images[0]
image.save("macaw.jpg")
Using comfyui: To use this workflow in ComfyUI, download the JSON file and load it:
Recommended Training Configuration
For LoRA fine-tuning, the following tools and settings are recommended:
🔧 Training Tools
- Kohya_ss: GitHub Repository
- Simple Tuner: GitHub Repository
⚙️ Suggested Training Settings
--Resolution 1440x1440
--t5xxl_max_token_length 154
--optimizer_type AdamW8bit
--mmdit_lr 1e-4
--text_encoder_lr 5e-5
Contact
- Website: https://tensor.art https://tusiart.com
- Developed by: TensorArt