Spaces:

sayakpaul
/

grade_images_with_gemini

Running

App Files Files Community

sayakpaul HF Staff commited on Feb 10

Commit

c2cc0af

1 Parent(s): f5500f9

updates

Browse files

Files changed (8) hide show

.gradio/cached_examples/8/log.csv +13 -0
README.md +2 -2
app.py +88 -0
car.jpg +0 -0
green_creature.jpg +0 -0
requirements.txt +3 -0
utils.py +41 -0
verifier_prompt.txt +54 -0

.gradio/cached_examples/8/log.csv ADDED Viewed

	@@ -0,0 +1,13 @@

+Grading Output,timestamp
+"* accuracy_to_prompt: 9.0 (explanation: The image accurately depicts a shiny black SUV with a mountain background, closely following the prompt's description.)
+* creativity_and_originality: 7.0 (explanation: The image showcases a standard but appealing composition, capturing a realistic scene. While it doesn't venture into highly creative interpretations, it's well-executed.)
+* visual_quality_and_realism: 9.0 (explanation: The visual quality is excellent, with sharp details and realistic rendering. The lighting and textures are well-handled, contributing to a high level of realism.)
+* consistency_and_cohesion: 9.0 (explanation: The image maintains a high level of internal consistency. All elements are logically placed, and the lighting and perspective align well, creating a cohesive scene.)
+* emotional_and_thematic_resonance: 8.0 (explanation: The image conveys a sense of adventure and exploration. The mountain backdrop and the sturdy SUV evoke feelings of freedom and the call of the outdoors.)
+* overall_score: 8.4 (explanation: Considering all aspects, the image is a strong and well-executed representation of the prompt. It effectively combines realism, accuracy, and emotional resonance.)",2025-02-10 11:28:15.347939
+"* accuracy_to_prompt: 8.0 (explanation: The image accurately depicts a green creature in a forest, fulfilling the main elements of the prompt. However, 'funny' is subjective and open to interpretation.)
+* creativity_and_originality: 6.0 (explanation: While the image fulfills the basic requirements, it doesn't show a high level of creativity. The creature's design is somewhat generic.)
+* visual_quality_and_realism: 9.0 (explanation: The visual quality is high, with good detail and rendering. The creature and forest are realistically depicted, enhancing the overall appeal.)
+* consistency_and_cohesion: 9.0 (explanation: The image maintains internal consistency with the creature fitting well within its forest surroundings. The lighting and perspective are coherent.)
+* emotional_and_thematic_resonance: 7.0 (explanation: The image evokes a lighthearted and slightly whimsical feeling. It doesn't strongly resonate emotionally but does align with the 'funny' aspect.)
+* overall_score: 7.5 (explanation: The overall score reflects a solid execution of the prompt with minor areas for improvement in creativity and emotional depth. The image is well-composed and aligned with the prompt.)",2025-02-10 11:28:22.288757

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: Grade Images With Gemini
 emoji: 🚀
 colorFrom: yellow
 colorTo: purple
@@ -7,7 +7,7 @@ sdk: gradio
 sdk_version: 5.15.0
 app_file: app.py
 pinned: false
-short_description: Uses Gemini 2.0 flash to grade images.
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Grade Images with Gemini
 emoji: 🚀
 colorFrom: yellow
 colorTo: purple
 sdk_version: 5.15.0
 app_file: app.py
 pinned: false
+short_description: Uses Gemini 2.0 Flash to grade images.
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

app.py ADDED Viewed

	@@ -0,0 +1,88 @@

+import gradio as gr
+from google import genai
+from utils import *
+from PIL import Image
+import os
+client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))
+system_instruction = load_verifier_prompt()
+generation_config = types.GenerateContentConfig(
+    system_instruction=system_instruction,
+    response_mime_type="application/json",
+    response_schema=list[Grading],
+    seed=1994,
+)
+def make_inputs(prompt, image):
+    inputs = []
+    inputs.extend(prepare_inputs(prompt=prompt, image=image))
+    return inputs
+def format_response(response: dict):
+    out = ""
+    for key, value in response.items():
+        score = f"* **{key}**: {value['score']} (explanation: {value['explanation']})\n"
+        out += score
+    return out
+def grade(prompt, image):
+    inputs = make_inputs(prompt, image)
+    response = client.models.generate_content(
+        model="gemini-2.0-flash", contents=types.Content(parts=inputs, role="user"), config=generation_config
+    )
+    parsed_response = response.parsed[0]
+    return format_response(parsed_response)
+examples = [
+    ["realistic photo a shiny black SUV car with a mountain in the background.", Image.open("car.jpg")],
+    ["photo a green and funny creature standing in front a lightweight forest.", Image.open("green_creature.jpg")],
+]
+css = """
+#col-container {
+    margin: 0 auto;
+    max-width: 520px;
+}
+"""
+with gr.Blocks(css=css) as demo:
+    with gr.Column(elem_id="col-container"):
+        gr.Markdown(
+            f"""# Grade images with Gemini 2.0 Flash
+        Following aspects are considered during grading:
+        * Accuracy to Prompt
+        * Creativity and Originality
+        * Visual Quality and Realism
+        * Consistency and Cohesion
+        * Emotional or Thematic Resonance
+        The [system prompt](./verifier_prompt.txt) comes from the paper: [Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps](https://arxiv.org/abs/2501.09732).
+        """
+        )
+        with gr.Row():
+            prompt = gr.Text(
+                label="Prompt",
+                show_label=False,
+                max_lines=1,
+                placeholder="Enter the prompt that generated the image to be graded.",
+                container=False,
+            )
+            run_button = gr.Button("Run", scale=0)
+        image = gr.Image(format="png", type="pil", label="Image", placeholder="The image to be graded.")
+        result = gr.Markdown(label="Grading Output")
+        gr.Examples(examples=examples, fn=grade, inputs=[prompt, image], outputs=[result], cache_examples=True)
+    gr.on(triggers=[run_button.click, prompt.submit], fn=grade, inputs=[prompt, image], outputs=[result])
+demo.launch()

car.jpg ADDED Viewed

green_creature.jpg ADDED Viewed

requirements.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+google-genai
+typing-extensions
+Pillow

utils.py ADDED Viewed

	@@ -0,0 +1,41 @@

+import typing_extensions as typing
+from PIL import Image
+import io
+from google.genai import types
+class Score(typing.TypedDict):
+    score: float
+    explanation: str
+class Grading(typing.TypedDict):
+    accuracy_to_prompt: Score
+    creativity_and_originality: Score
+    visual_quality_and_realism: Score
+    consistency_and_cohesion: Score
+    emotional_and_thematic_resonance: Score
+    overall_score: Score
+def convert_to_bytes(image: Image.Image) -> bytes:
+    image_bytes_io = io.BytesIO()
+    image.save(image_bytes_io, format="PNG")
+    return image_bytes_io.getvalue()
+def prepare_inputs(prompt: str, image: Image.Image):
+    """Prepare inputs for the API from a given prompt and image."""
+    inputs = [
+        types.Part.from_text(text=prompt),
+        types.Part.from_bytes(data=convert_to_bytes(image), mime_type="image/png"),
+    ]
+    return inputs
+def load_verifier_prompt():
+    """Loads the system prompt for Gemini when it acts as a verifier to grade images."""
+    with open("verifier_prompt.txt", "r") as f:
+        verifier_prompt = f.read().replace('"""', "")
+    return verifier_prompt

verifier_prompt.txt ADDED Viewed

	@@ -0,0 +1,54 @@

+"""
+You are a multimodal large-language model tasked with evaluating images
+generated by a text-to-image model. Your goal is to assess each generated
+image based on specific aspects and provide a detailed critique, along with
+a scoring system. The final output should be formatted as a JSON object
+containing individual scores for each aspect and an overall score. The keys
+in the JSON object should be: `accuracy_to_prompt`, `creativity_and_originality`,
+`visual_quality_and_realism`, `consistency_and_cohesion`,
+`emotional_and_thematic_resonance`, and `overall_score`. Below is a comprehensive
+guide to follow in your evaluation process:
+1. Key Evaluation Aspects and Scoring Criteria:
+For each aspect, provide a score from 0 to 10, where 0 represents poor
+performance and 10 represents excellent performance. For each score, include
+a short explanation or justification (1-2 sentences) explaining why that
+score was given. The aspects to evaluate are as follows:
+a) Accuracy to Prompt
+Assess how well the image matches the description given in the prompt.
+Consider whether all requested elements are present and if the scene,
+objects, and setting align accurately with the text. Score: 0 (no
+alignment) to 10 (perfect match to prompt).
+b) Creativity and Originality
+Evaluate the uniqueness and creativity of the generated image. Does the
+model present an imaginative or aesthetically engaging interpretation of the
+prompt? Is there any evidence of creativity beyond a literal interpretation?
+Score: 0 (lacks creativity) to 10 (highly creative and original).
+c) Visual Quality and Realism
+Assess the overall visual quality, including resolution, detail, and realism.
+Look for coherence in lighting, shading, and perspective. Even if the image
+is stylized or abstract, judge whether the visual elements are well-rendered
+and visually appealing. Score: 0 (poor quality) to 10 (high-quality and
+realistic).
+d) Consistency and Cohesion
+Check for internal consistency within the image. Are all elements cohesive
+and aligned with the prompt? For instance, does the perspective make sense,
+and do objects fit naturally within the scene without visual anomalies?
+Score: 0 (inconsistent) to 10 (fully cohesive and consistent).
+e) Emotional or Thematic Resonance
+Evaluate how well the image evokes the intended emotional or thematic tone of
+the prompt. For example, if the prompt is meant to be serene, does the image
+convey calmness? If it’s adventurous, does it evoke excitement? Score: 0
+(no resonance) to 10 (strong resonance with the prompt’s theme).
+2. Overall Score
+After scoring each aspect individually, provide an overall score,
+representing the model’s general performance on this image. This should be
+a weighted average based on the importance of each aspect to the prompt or an
+average of all aspects.
+"""