File size: 3,322 Bytes
be4f957
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6e9b93f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6cfced7
6e9b93f
 
 
 
 
 
 
 
 
 
6cfced7
 
6e9b93f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
title: Gradio Image Code
emoji: ๐ŸŒ–
colorFrom: pink
colorTo: yellow
sdk: gradio
sdk_version: 5.32.1
app_file: app.py
pinned: false


---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference


# ๐Ÿง  Qwen + DeepSeek Gradio App

A Gradio web app that demonstrates:
- **Image Captioning** using [Qwen-VL-Chat-Int4](https://huggingface.co/Qwen/Qwen-VL-Chat-Int4)
- **Code Generation** using [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)

This app is tested and runs efficiently on **Kaggle notebooks** with **T4 x2 GPU accelerators**.

> โš ๏ธ **Note:** Colab is not recommended for this project because downloading the `Qwen-VL-Chat-Int4` model takes a long time and often fails. Kaggle is faster and more stable.

---
## ๐Ÿš€ Features

- ๐Ÿ–ผ๏ธ Vision-Language tab: Upload an image + custom prompt โ†’ generate short description
- ๐Ÿ’ป Code Generator tab: Write a prompt โ†’ get streaming code output
- Adjustable decoding parameters: temperature, top-p, max_new_tokens

---

## ๐Ÿงฉ Installation
```bash
pip install transformers
pip install gradio
pip install transformers_stream_generator optimum auto-gptq
```

Ensure your runtime supports GPU (e.g., Colab or local CUDA environment).

---

## ๐Ÿ“ฆ Model Details

### 1. Qwen-VL-Chat-Int4 (Image-to-Text)

- Used for concise image descriptions.
- Streaming output with `TextIteratorStreamer`.
- Prompt format:

```
<|system|>
You are a helpful assistant that describes images very concisely...
<|end|>
<|user|>
Describe the image...
<|end|>
<|assistant|>
```

#### ๐Ÿ”ง Prompt Engineering Insight

- Without `<|assistant|>` tag, the model sometimes overwrites or fails to complete properly.
- Adding `<|assistant|>` clearly indicates the modelโ€™s turn, reducing hallucinations.
- **Temperature capped to ~1.0** because higher values (e.g., 1.2+) lead to creative but false outputs.

### 2. DeepSeek-R1-Distill-Qwen-1.5B (Text-to-Code)

- Generates Python or other code from natural language prompts.
- Uses chat-based prompting with:
  - `<think>...</think>` block for reasoning.
  - Final answer separated to improve clarity.

#### ๐Ÿ”ง Prompt Engineering Insight

- Initially used no system prompt โ†’ vague reasoning.
- Adding a system prompt improved guidance.
- Separating "thinking" and "final answer" boosted relevance.
- Future improvement: split thinking and answer into **separate UI tabs**.


## ๐Ÿ–ผ๏ธ Usage: Image Description Tab

- Upload an image.
- Write a natural prompt (e.g., "What is in this picture?")
- Adjust:
  - `Temperature`: Higher = more creativity, but limit for stability.
  - `Top-p`: Controls sampling diversity.
  - `Max new tokens`: Max length of generated sentence.
- Click **Generate** โ†’ streaming description appears.


## ๐Ÿ’ป Usage: Code Generation Tab

- Write a programming task (e.g., "Write Python code to reverse a string.")
- Adjust generation settings as above.
- Streaming output displays generated code.
- Stops early if vague prompt โ†’ clarify prompt to improve results.


## ๐Ÿง  Future Work

- Add a **separate tab** for model โ€œthinkingโ€ (`<think>...</think>`) versus final code.
- Optional logging for input-output pairs to track hallucinations or failures.
- Add Markdown rendering for image descriptions.