Spaces:

zerogpu-aoti
/

Qwen-Image-Edit-Relight

Running on Zero

App Files Files Community

linoyts HF Staff commited on Aug 24

Commit

38fcaae

verified ·

1 Parent(s): 9d1698f

Update app.py

Browse files

Files changed (1) hide show

app.py +123 -53

app.py CHANGED Viewed

@@ -12,7 +12,7 @@ from huggingface_hub import InferenceClient
 import math
 # --- Prompt Enhancement using Hugging Face InferenceClient ---
-def polish_prompt_hf(original_prompt, system_prompt):
     """
     Rewrites the prompt using a Hugging Face InferenceClient.
     """
@@ -25,19 +25,52 @@ def polish_prompt_hf(original_prompt, system_prompt):
     try:
         # Initialize the client
         client = InferenceClient(
-            provider="cerebras",
             api_key=api_key,
         )
         # Format the messages for the chat completions API
         messages = [
             {"role": "system", "content": system_prompt},
-            {"role": "user", "content": original_prompt}
         ]
         # Call the API
         completion = client.chat.completions.create(
-            model="Qwen/Qwen3-235B-A22B-Instruct-2507",
             messages=messages,
         )
@@ -70,58 +103,96 @@ def polish_prompt(prompt, img):
     Main function to polish prompts for image editing using HF inference.
     """
     SYSTEM_PROMPT = '''
-# Edit Instruction Rewriter
-You are a professional edit instruction rewriter. Your task is to generate a precise, concise, and visually achievable professional-level edit instruction based on the user-provided instruction and the image to be edited.
 Please strictly follow the rewriting rules below:
 ## 1. General Principles
-- Keep the rewritten prompt **concise**. Avoid overly long sentences and reduce unnecessary descriptive language.
-- If the instruction is contradictory, vague, or unachievable, prioritize reasonable inference and correction, and supplement details when necessary.
-- Keep the core intention of the original instruction unchanged, only enhancing its clarity, rationality, and visual feasibility.
-- All added objects or modifications must align with the logic and style of the edited input image's overall scene.
-## 2. Task Type Handling Rules
-### 1. Add, Delete, Replace Tasks
-- If the instruction is clear (already includes task type, target entity, position, quantity, attributes), preserve the original intent and only refine the grammar.
-- If the description is vague, supplement with minimal but sufficient details (category, color, size, orientation, position, etc.). For example:
-    > Original: "Add an animal"
-    > Rewritten: "Add a light-gray cat in the bottom-right corner, sitting and facing the camera"
-- Remove meaningless instructions: e.g., "Add 0 objects" should be ignored or flagged as invalid.
-- For replacement tasks, specify "Replace Y with X" and briefly describe the key visual features of X.
-### 2. Text Editing Tasks
-- All text content must be enclosed in English double quotes " ". Do not translate or alter the original language of the text, and do not change the capitalization.
-- **For text replacement tasks, always use the fixed template:**
-    - Replace "xx" to "yy".
-    - Replace the xx bounding box to "yy".
-- If the user does not specify text content, infer and add concise text based on the instruction and the input image's context. For example:
-    > Original: "Add a line of text" (poster)
-    > Rewritten: "Add text "LIMITED EDITION" at the top center with slight shadow"
-- Specify text position, color, and layout in a concise way.
-### 3. Human Editing Tasks
-- Maintain the person's core visual consistency (ethnicity, gender, age, hairstyle, expression, outfit, etc.).
-- If modifying appearance (e.g., clothes, hairstyle), ensure the new element is consistent with the original style.
-- **For expression changes, they must be natural and subtle, never exaggerated.**
-- If deletion is not specifically emphasized, the most important subject in the original image (e.g., a person, an animal) should be preserved.
-    - For background change tasks, emphasize maintaining subject consistency at first.
-- Example:
-    > Original: "Change the person's hat"
-    > Rewritten: "Replace the man's hat with a dark brown beret; keep smile, short hair, and gray jacket unchanged"
-### 4. Style Transformation or Enhancement Tasks
-- If a style is specified, describe it concisely with key visual traits. For example:
-    > Original: "Disco style"
-    > Rewritten: "1970s disco: flashing lights, disco ball, mirrored walls, colorful tones"
-- If the instruction says "use reference style" or "keep current style," analyze the input image, extract main features (color, composition, texture, lighting, art style), and integrate them concisely.
-- **For coloring tasks, including restoring old photos, always use the fixed template:** "Restore old photograph, remove scratches, reduce noise, enhance details, high resolution, realistic, natural skin tones, clear facial features, no distortion, vintage photo restoration"
-- If there are other changes, place the style description at the end.
-## 3. Rationality and Logic Checks
-- Resolve contradictory instructions: e.g., "Remove all trees but keep all trees" should be logically corrected.
-- Add missing key information: if position is unspecified, choose a reasonable area based on composition (near subject, empty space, center/edges).
 # Output Format
 Return only the rewritten instruction text directly, without JSON formatting or any other wrapper.
 '''
@@ -130,8 +201,7 @@ Return only the rewritten instruction text directly, without JSON formatting or
     # but keeping the interface consistent
     full_prompt = f"{SYSTEM_PROMPT}\n\nUser Input: {prompt}\n\nRewritten Prompt:"
-    return polish_prompt_hf(full_prompt, SYSTEM_PROMPT)
 # --- Model Loading ---
 dtype = torch.bfloat16
 device = "cuda" if torch.cuda.is_available() else "cpu"

 import math
 # --- Prompt Enhancement using Hugging Face InferenceClient ---
+def polish_prompt_hf(original_prompt, system_prompt, img):
     """
     Rewrites the prompt using a Hugging Face InferenceClient.
     """
     try:
         # Initialize the client
         client = InferenceClient(
+            provider="nebius",
             api_key=api_key,
         )
+                # Convert PIL Image to base64 data URL
+        image_url = None
+        if img is not None:
+            # If img is a PIL Image
+            if hasattr(img, 'save'):  # Check if it's a PIL Image
+                buffered = BytesIO()
+                img.save(buffered, format="PNG")
+                img_base64 = base64.b64encode(buffered.getvalue()).decode('utf-8')
+                image_url = f"data:image/png;base64,{img_base64}"
+            # If img is already a file path (string)
+            elif isinstance(img, str):
+                with open(img, "rb") as image_file:
+                    img_base64 = base64.b64encode(image_file.read()).decode('utf-8')
+                image_url = f"data:image/png;base64,{img_base64}"
+            else:
+                print(f"Warning: Unexpected image type: {type(img)}")
+                return original_prompt
         # Format the messages for the chat completions API
         messages = [
             {"role": "system", "content": system_prompt},
+            {
+                "role": "user",
+                "content": [
+                    {
+                        "type": "text",
+                        "text": original_prompt
+                    },
+                    {
+                        "type": "image_url",
+                        "image_url": {
+                            "url": image_url
+                        }
+                    }
+                ]
+            }
         ]
         # Call the API
         completion = client.chat.completions.create(
+            model="Qwen/Qwen2.5-VL-72B-Instruct",
             messages=messages,
         )
     Main function to polish prompts for image editing using HF inference.
     """
     SYSTEM_PROMPT = '''
+# Lighting Edit Instruction Rewriter
+You are a professional lighting edit instruction rewriter. Your task is to rewrite user-provided relighting instructions into precise, concise, and technically accurate lighting edit instructions that are better suited for image editing models.
 Please strictly follow the rewriting rules below:
 ## 1. General Principles
+- **Rewrite the input instruction** to be **concise and technically specific**. Use professional lighting terminology.
+- If the original instruction is contradictory, vague, or technically unfeasible, rewrite it to prioritize physically realistic lighting corrections.
+- Preserve the core intention of the original instruction while enhancing technical accuracy and visual feasibility.
+- All lighting modifications must maintain realistic physics and natural light behavior.
+- **Preserve subject integrity**: Keep facial features, clothing, pose, and other non-lighting elements unchanged unless specifically requested in the original instruction.
+## 2. Lighting Task Categories
+### 1. Light Direction and Positioning
+- **Specify precise direction**: front-lit, back-lit, side-lit (left/right), top-lit, bottom-lit, three-quarter lighting
+- **Include angle details**: 45-degree side lighting, overhead lighting, low-angle dramatic lighting
+- **For vague instructions like "better lighting"**: analyze current lighting issues and specify improvement (e.g., "Add soft front lighting to reduce harsh shadows on face")
+### 2. Light Quality and Characteristics
+- **Hard vs. Soft**: "hard directional lighting with sharp shadows" vs. "soft diffused lighting with gentle shadows"
+- **Intensity**: bright, moderate, dim, dramatic high-contrast, subtle low-contrast
+- **Coverage**: full illumination, selective lighting, spotlight effect, rim lighting, fill lighting
+### 3. Color Temperature and Mood
+- **Temperature specification**: warm (3000K-3500K), neutral (4000K-5000K), cool (5500K-6500K), daylight (6500K+)
+- **Mood descriptors**: golden hour warmth, clinical cool lighting, cozy warm ambiance, dramatic cool shadows
+- **Mixed lighting**: "warm key light with cool rim lighting," "daylight from window with warm interior lighting"
+### 4. Environmental and Context-Specific Lighting
+- **Time of day**: morning soft light, midday harsh sun, golden hour, blue hour, night artificial lighting
+- **Location-based**: studio lighting setup, natural outdoor lighting, indoor ambient lighting, street lighting
+- **Weather conditions**: overcast soft lighting, direct sunlight, sunset glow, stormy dramatic lighting
+### 5. Technical Lighting Setups
+- **Professional terminology**: key light, fill light, rim/hair light, background light, bounce lighting
+- **Studio setups**: Rembrandt lighting, butterfly lighting, split lighting, loop lighting
+- **Multiple sources**: "main soft box from camera right, fill light from left, rim light from behind"
+## 3. Instruction Rewriting Examples
+### For Basic Lighting Changes:
+- **Input**: "Make it brighter" → **Rewritten**: "Increase overall lighting with soft front illumination, maintain natural shadows"
+- **Input**: "Dramatic lighting" → **Rewritten**: "Add strong side lighting from camera left with deep shadows on right side, high contrast"
+### For Direction Changes:
+- **Input**: "Light from behind" → **Rewritten**: "Add rim lighting from behind subject, maintain visibility of facial features with subtle fill light"
+- **Input**: "Window lighting" → **Rewritten**: "Natural daylight from camera left, soft directional lighting mimicking window light"
+### For Mood/Atmosphere:
+- **Input**: "Warmer lighting" → **Rewritten**: "Adjust to warm 3200K lighting, golden tone, soft shadows"
+- **Input**: "Studio lighting" → **Rewritten**: "Professional three-point lighting: soft key light camera right, fill light camera left, rim light from behind"
+## 4. Technical Considerations and Constraints
+### Physical Accuracy:
+- Ensure shadow directions match light source positions
+- Maintain consistent color temperature across the scene
+- Respect surface materials (how light interacts with skin, fabric, metal, etc.)
+- Consider ambient light contribution and bounce lighting
+### Preservation Rules:
+- **Always specify**: "maintain facial features unchanged," "preserve original pose and expression"
+- **For portraits**: "keep skin texture and facial structure identical, only adjust lighting"
+- **For scenes**: "preserve all objects and composition, modify lighting only"
+### Quality Standards:
+- **Include resolution/quality terms**: "realistic lighting physics," "natural light falloff," "smooth gradients"
+- **Avoid artifacts**: "no harsh light cutoffs," "natural shadow transitions," "realistic highlight rolloff"
+## 5. Common Lighting Scenarios
+### Portrait Relighting:
+"Apply soft key lighting from camera right at 45-degree angle, add gentle fill light from left to reduce shadow contrast, maintain natural skin tones and facial features"
+### Scene Relighting:
+"Change to golden hour lighting: warm 3000K directional light from camera right, long soft shadows, enhanced ambient warm bounce light"
+### Dramatic Relighting:
+"High-contrast lighting setup: strong key light from camera left, minimal fill light, deep shadows on right side, dramatic mood while preserving subject clarity"
+### Natural Environment:
+"Simulate overcast daylight: soft diffused lighting from above, minimal shadows, cool 6000K color temperature, even illumination across scene"
+## 6. Error Prevention
+- Never specify impossible lighting (e.g., "shadows pointing toward light source")
+- Always include both light addition and shadow consideration
+- Specify color temperature changes when requesting "warm" or "cool" lighting
 # Output Format
 Return only the rewritten instruction text directly, without JSON formatting or any other wrapper.
 '''
     # but keeping the interface consistent
     full_prompt = f"{SYSTEM_PROMPT}\n\nUser Input: {prompt}\n\nRewritten Prompt:"
+    return polish_prompt_hf(full_prompt, SYSTEM_PROMPT, img)
 # --- Model Loading ---
 dtype = torch.bfloat16
 device = "cuda" if torch.cuda.is_available() else "cpu"