Spaces:
Runtime error
Runtime error
File size: 3,322 Bytes
be4f957 6e9b93f 6cfced7 6e9b93f 6cfced7 6e9b93f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
---
title: Gradio Image Code
emoji: ๐
colorFrom: pink
colorTo: yellow
sdk: gradio
sdk_version: 5.32.1
app_file: app.py
pinned: false
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# ๐ง Qwen + DeepSeek Gradio App
A Gradio web app that demonstrates:
- **Image Captioning** using [Qwen-VL-Chat-Int4](https://huggingface.co/Qwen/Qwen-VL-Chat-Int4)
- **Code Generation** using [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)
This app is tested and runs efficiently on **Kaggle notebooks** with **T4 x2 GPU accelerators**.
> โ ๏ธ **Note:** Colab is not recommended for this project because downloading the `Qwen-VL-Chat-Int4` model takes a long time and often fails. Kaggle is faster and more stable.
---
## ๐ Features
- ๐ผ๏ธ Vision-Language tab: Upload an image + custom prompt โ generate short description
- ๐ป Code Generator tab: Write a prompt โ get streaming code output
- Adjustable decoding parameters: temperature, top-p, max_new_tokens
---
## ๐งฉ Installation
```bash
pip install transformers
pip install gradio
pip install transformers_stream_generator optimum auto-gptq
```
Ensure your runtime supports GPU (e.g., Colab or local CUDA environment).
---
## ๐ฆ Model Details
### 1. Qwen-VL-Chat-Int4 (Image-to-Text)
- Used for concise image descriptions.
- Streaming output with `TextIteratorStreamer`.
- Prompt format:
```
<|system|>
You are a helpful assistant that describes images very concisely...
<|end|>
<|user|>
Describe the image...
<|end|>
<|assistant|>
```
#### ๐ง Prompt Engineering Insight
- Without `<|assistant|>` tag, the model sometimes overwrites or fails to complete properly.
- Adding `<|assistant|>` clearly indicates the modelโs turn, reducing hallucinations.
- **Temperature capped to ~1.0** because higher values (e.g., 1.2+) lead to creative but false outputs.
### 2. DeepSeek-R1-Distill-Qwen-1.5B (Text-to-Code)
- Generates Python or other code from natural language prompts.
- Uses chat-based prompting with:
- `<think>...</think>` block for reasoning.
- Final answer separated to improve clarity.
#### ๐ง Prompt Engineering Insight
- Initially used no system prompt โ vague reasoning.
- Adding a system prompt improved guidance.
- Separating "thinking" and "final answer" boosted relevance.
- Future improvement: split thinking and answer into **separate UI tabs**.
## ๐ผ๏ธ Usage: Image Description Tab
- Upload an image.
- Write a natural prompt (e.g., "What is in this picture?")
- Adjust:
- `Temperature`: Higher = more creativity, but limit for stability.
- `Top-p`: Controls sampling diversity.
- `Max new tokens`: Max length of generated sentence.
- Click **Generate** โ streaming description appears.
## ๐ป Usage: Code Generation Tab
- Write a programming task (e.g., "Write Python code to reverse a string.")
- Adjust generation settings as above.
- Streaming output displays generated code.
- Stops early if vague prompt โ clarify prompt to improve results.
## ๐ง Future Work
- Add a **separate tab** for model โthinkingโ (`<think>...</think>`) versus final code.
- Optional logging for input-output pairs to track hallucinations or failures.
- Add Markdown rendering for image descriptions. |