Gradio_image_code / README.md
AhmedHAnwar's picture
Update README.md
be4f957 verified
---
title: Gradio Image Code
emoji: ๐ŸŒ–
colorFrom: pink
colorTo: yellow
sdk: gradio
sdk_version: 5.32.1
app_file: app.py
pinned: false
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# ๐Ÿง  Qwen + DeepSeek Gradio App
A Gradio web app that demonstrates:
- **Image Captioning** using [Qwen-VL-Chat-Int4](https://huggingface.co/Qwen/Qwen-VL-Chat-Int4)
- **Code Generation** using [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)
This app is tested and runs efficiently on **Kaggle notebooks** with **T4 x2 GPU accelerators**.
> โš ๏ธ **Note:** Colab is not recommended for this project because downloading the `Qwen-VL-Chat-Int4` model takes a long time and often fails. Kaggle is faster and more stable.
---
## ๐Ÿš€ Features
- ๐Ÿ–ผ๏ธ Vision-Language tab: Upload an image + custom prompt โ†’ generate short description
- ๐Ÿ’ป Code Generator tab: Write a prompt โ†’ get streaming code output
- Adjustable decoding parameters: temperature, top-p, max_new_tokens
---
## ๐Ÿงฉ Installation
```bash
pip install transformers
pip install gradio
pip install transformers_stream_generator optimum auto-gptq
```
Ensure your runtime supports GPU (e.g., Colab or local CUDA environment).
---
## ๐Ÿ“ฆ Model Details
### 1. Qwen-VL-Chat-Int4 (Image-to-Text)
- Used for concise image descriptions.
- Streaming output with `TextIteratorStreamer`.
- Prompt format:
```
<|system|>
You are a helpful assistant that describes images very concisely...
<|end|>
<|user|>
Describe the image...
<|end|>
<|assistant|>
```
#### ๐Ÿ”ง Prompt Engineering Insight
- Without `<|assistant|>` tag, the model sometimes overwrites or fails to complete properly.
- Adding `<|assistant|>` clearly indicates the modelโ€™s turn, reducing hallucinations.
- **Temperature capped to ~1.0** because higher values (e.g., 1.2+) lead to creative but false outputs.
### 2. DeepSeek-R1-Distill-Qwen-1.5B (Text-to-Code)
- Generates Python or other code from natural language prompts.
- Uses chat-based prompting with:
- `<think>...</think>` block for reasoning.
- Final answer separated to improve clarity.
#### ๐Ÿ”ง Prompt Engineering Insight
- Initially used no system prompt โ†’ vague reasoning.
- Adding a system prompt improved guidance.
- Separating "thinking" and "final answer" boosted relevance.
- Future improvement: split thinking and answer into **separate UI tabs**.
## ๐Ÿ–ผ๏ธ Usage: Image Description Tab
- Upload an image.
- Write a natural prompt (e.g., "What is in this picture?")
- Adjust:
- `Temperature`: Higher = more creativity, but limit for stability.
- `Top-p`: Controls sampling diversity.
- `Max new tokens`: Max length of generated sentence.
- Click **Generate** โ†’ streaming description appears.
## ๐Ÿ’ป Usage: Code Generation Tab
- Write a programming task (e.g., "Write Python code to reverse a string.")
- Adjust generation settings as above.
- Streaming output displays generated code.
- Stops early if vague prompt โ†’ clarify prompt to improve results.
## ๐Ÿง  Future Work
- Add a **separate tab** for model โ€œthinkingโ€ (`<think>...</think>`) versus final code.
- Optional logging for input-output pairs to track hallucinations or failures.
- Add Markdown rendering for image descriptions.