Spaces:

AhmedHAnwar
/

Gradio_image_code

Runtime error

File size: 3,322 Bytes

---
title: Gradio Image Code
emoji: 🌖
colorFrom: pink
colorTo: yellow
sdk: gradio
sdk_version: 5.32.1
app_file: app.py
pinned: false


---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference


# 🧠 Qwen + DeepSeek Gradio App

A Gradio web app that demonstrates:
- **Image Captioning** using [Qwen-VL-Chat-Int4](https://huggingface.co/Qwen/Qwen-VL-Chat-Int4)
- **Code Generation** using [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)

This app is tested and runs efficiently on **Kaggle notebooks** with **T4 x2 GPU accelerators**.

> ⚠️ **Note:** Colab is not recommended for this project because downloading the `Qwen-VL-Chat-Int4` model takes a long time and often fails. Kaggle is faster and more stable.

---
## 🚀 Features

- 🖼️ Vision-Language tab: Upload an image + custom prompt → generate short description
- 💻 Code Generator tab: Write a prompt → get streaming code output
- Adjustable decoding parameters: temperature, top-p, max_new_tokens

---

## 🧩 Installation
```bash
pip install transformers
pip install gradio
pip install transformers_stream_generator optimum auto-gptq
```

Ensure your runtime supports GPU (e.g., Colab or local CUDA environment).

---

## 📦 Model Details

### 1. Qwen-VL-Chat-Int4 (Image-to-Text)

- Used for concise image descriptions.
- Streaming output with `TextIteratorStreamer`.
- Prompt format:

```
<|system|>
You are a helpful assistant that describes images very concisely...
<|end|>
<|user|>
Describe the image...
<|end|>
<|assistant|>
```

#### 🔧 Prompt Engineering Insight

- Without `<|assistant|>` tag, the model sometimes overwrites or fails to complete properly.
- Adding `<|assistant|>` clearly indicates the model’s turn, reducing hallucinations.
- **Temperature capped to ~1.0** because higher values (e.g., 1.2+) lead to creative but false outputs.

### 2. DeepSeek-R1-Distill-Qwen-1.5B (Text-to-Code)

- Generates Python or other code from natural language prompts.
- Uses chat-based prompting with:
  - `<think>...</think>` block for reasoning.
  - Final answer separated to improve clarity.

#### 🔧 Prompt Engineering Insight

- Initially used no system prompt → vague reasoning.
- Adding a system prompt improved guidance.
- Separating "thinking" and "final answer" boosted relevance.
- Future improvement: split thinking and answer into **separate UI tabs**.


## 🖼️ Usage: Image Description Tab

- Upload an image.
- Write a natural prompt (e.g., "What is in this picture?")
- Adjust:
  - `Temperature`: Higher = more creativity, but limit for stability.
  - `Top-p`: Controls sampling diversity.
  - `Max new tokens`: Max length of generated sentence.
- Click **Generate** → streaming description appears.


## 💻 Usage: Code Generation Tab

- Write a programming task (e.g., "Write Python code to reverse a string.")
- Adjust generation settings as above.
- Streaming output displays generated code.
- Stops early if vague prompt → clarify prompt to improve results.


## 🧠 Future Work

- Add a **separate tab** for model “thinking” (`<think>...</think>`) versus final code.
- Optional logging for input-output pairs to track hallucinations or failures.
- Add Markdown rendering for image descriptions.