Bokeh 3.5 Medium

Bokeh 3.5 Medium is a Continue-training model built upon the stable diffusion 3.5 medium foundation, further refined using a 500W high-resolution open-source dataset with rigorous aesthetic curation. This ensures outstanding image quality, fine detail preservation, and enhanced controllability.

This model is released under the Stability Community License. For more details, visit Tensor.Art or TusiArt to explore additional resources and useful information.

Overview

Continue-training on SD3.5M, leveraging a large-scale 500W high-resolution dataset, carefully curated for aesthetic quality.
Supports hybrid short/long caption training for enhanced natural language understanding.
- Short Captions: Focus on core image features.
- Long Captions: Provide broader scene context and atmospheric details.
Recommended Resolutions:
1920x1024, 1728x1152, 1152x1728, 1280x1664, 1440x1440
Best Quality Training Resolution: 1440x1440
Supports LoRA fine-tuning.

Advantages

🖼️ High-Quality Image Generation

State-of-the-art visual fidelity with improved detail extraction and aesthetic consistency.
Enhanced resolution support up to 200W pixels, ensuring highly detailed image outputs.
Carefully curated dataset ensures better composition, lighting, and overall artistic appeal.

🎯 Powerful Custom Fine-Tuning

Exceptional LoRA training support, making it highly effective for:
- Photography
- 3D Rendering
- Illustration
- Concept Art

⚡ Efficient Inference & Training

Low hardware requirements for inference:
- Medium model: 9GB VRAM (without T5)
- Full weights inference: 16GB VRAM (suitable for local deployment)
LoRA fine-tuning VRAM requirement: 12GB - 32GB

Known Issues

Potential human anatomy inconsistencies.
Limited ability to generate photorealistic images.
Some concepts may suffer from aesthetic quality issues.

Prompting Guide

Use a structured prompt combining:

Main subject (e.g., "Close-up of a macaw")
Detailed features (e.g., "vivid feathers, sharp beak")
Background environment (e.g., "dimly lit environment")
Atmospheric description (e.g., "soft warm lighting, cinematic mood")

Best Practices:

Avoid overly complex prompts, as the model already has strong text encoding. Overloading details can cause T5 hallucination artifacts, reducing image quality.
Do not use excessively short prompts (e.g., single words or 2-3 tokens) unless combined with LoRA or Image2Image (i2i) techniques.
Avoid mixing too many unrelated concepts, as this can lead to visual distortions and unwanted artifacts.
Optimal token length: 30-70 tokens.

Negative Prompting

Negative prompts strongly influence image quality.
Ensure they do not contradict the main subject to avoid degrading the output.

Example Output

Using diffusers：

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained("/mnt/share/pcm_outputs/bokeh_3.5_medium", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

image = pipe(
    "Close-up of a macaw, dimly lit environment",
    num_inference_steps=28,
    guidance_scale=4,
    height=1920,
    width=1024,
).images[0]
image.save("macaw.jpg")

Using comfyui: To use this workflow in ComfyUI, download the JSON file and load it:

Download Workflow

Recommended Training Configuration

For LoRA fine-tuning, the following tools and settings are recommended:

🔧 Training Tools

Kohya_ss: GitHub Repository
Simple Tuner: GitHub Repository

⚙️ Suggested Training Settings

--Resolution 1440x1440
--t5xxl_max_token_length 154
--optimizer_type AdamW8bit
--mmdit_lr 1e-4
--text_encoder_lr 5e-5

Contact

Website: https://tensor.art https://tusiart.com
Developed by: TensorArt