Bokeh 3.5 Medium

Bokeh 3.5 Medium is based on Stable Diffusion 3.5 Medium as its foundation model, using a 5M high-resolution open-source dataset that underwent rigorous quality and aesthetic screening for post-training, ensuring excellent image quality, high fidelity of natural images, preservation of fine details, and enhanced controllability.

This model is released under the Stability Community License. For more details, visit Tensor.Art or TusiArt to explore additional resources and useful information.

Overview

Continued training on SD3.5M, utilizing carefully curated high-resolution training data to achieve excellent image quality.
Trained with mixed short/long natural language captions.
- Short Captions: Focus on the core subject content of the image.
- Long Captions: Provide broader descriptions of the scene environment and atmosphere.
Recommended Resolutions:
1920x1024, 1728x1152, 1152x1728, 1280x1664, 1440x1440
Powerful customized fine-tuning performance that can be widely used for downstream production tasks.
Powerful customized fine-tuning performance that can be widely used for downstream production tasks.
Achieve 8~10step image generation through strong distillation technology, with high-resolution images generated in just 5 seconds on a 3090-level GPU with some quality loss. You can use the 8steps lora with the base checkpoint or use the 8step checkpoint.

Advantages

🖼️ High-Quality Image Generation

State-of-the-art visual fidelity with improved detail extraction and aesthetic consistency.
Enhanced resolution support up to 200W pixels, ensuring highly detailed image outputs.
Carefully curated dataset ensures better composition, lighting, and overall artistic appeal.

🎯 Powerful Custom Fine-Tuning

Exceptional LoRA training support, making it highly effective for:
- Photography
- 3D Rendering
- Illustration
- Concept Art

⚡ Efficient Inference & Training

Low hardware requirements for inference:
- Medium model: 9GB VRAM (without T5)
- Full weights inference: 16GB VRAM (suitable for local deployment)
LoRA fine-tuning VRAM requirement: 12GB - 32GB

Known Issues

Potential human anatomy inconsistencies.
Limited ability to generate photorealistic images.
Some concepts may suffer from aesthetic quality issues.

Prompting Guide

Use a structured prompt combining:

Main subject (e.g., "Close-up of a macaw")
Detailed features (e.g., "vivid feathers, sharp beak")
Background environment (e.g., "dimly lit environment")
Atmospheric description (e.g., "soft warm lighting, cinematic mood")
Optimal token length: 30-70 tokens.

Example Output

Using diffusers：

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained("tensorart/bokeh_3.5_medium", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

image = pipe(
    "Close-up of a macaw, dimly lit environment",
    num_inference_steps=28,
    guidance_scale=4,
    height=1920,
    width=1024,
    negative_prompt="anime,cartoon,bad hands,extra finger，blurred,text,watermark",
    negative_prompt_3=""
).images[0]
image.save("macaw.jpg")