File size: 9,052 Bytes
6c3722f
 
 
 
 
 
 
 
 
 
37158f8
6c3722f
de31118
 
37158f8
 
d7c78a2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
de31118
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4e89aed
67a1ae5
4e89aed
67a1ae5
 
4e89aed
67a1ae5
 
 
 
 
 
 
 
4e89aed
67a1ae5
4e89aed
 
67a1ae5
 
 
 
 
 
4e89aed
 
67a1ae5
7bab78e
c5df899
292333e
c5df899
292333e
2fe7933
292333e
 
 
 
 
 
 
 
 
 
 
 
 
2fe7933
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c5df899
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
title: TorchTransformers Diffusion CV SFT
emoji: 
colorFrom: yellow
colorTo: indigo
sdk: streamlit
sdk_version: 1.43.2
app_file: app.py
pinned: false
license: mit
short_description: Torch Transformers Diffusion SFT f. Streamlit & C. Vision
---


# Integration Details

1. SFT Tiny Titans (First Listing):
  - Features: Causal LM and Diffusion SFT, camera snap, RAG party.
  - Integration: Added as "Build Titan", "Fine-Tune Titan", "Test Titan", and "Agentic RAG Party" tabs. Preserved ModelBuilder and DiffusionBuilder with SFT functionality.
2. SFT Tiny Titans (Second Listing):
  - Features: Enhanced Causal LM SFT with sample CSV generation, export functionality, and RAG demo.
  - Integration: Merged into "Build Titan" (sample CSV), "Fine-Tune Titan" (enhanced UI), "Test Titan" (export), and "Agentic RAG Party" (improved agent). Used PartyPlannerAgent from this listing for its detailed RAG output.
3. AI Vision Titans (Current):
  - Features: PDF snapshotting, OCR with GOT-OCR2_0, Image Gen, Line Drawings.
  - Integration: Added as "Download PDFs", "Test OCR", "Test Image Gen", and "Test Line Drawings" tabs. Retained async processing and gallery updates.
4. Sidebar, Session, and History:
  - Unified gallery shows PNGs and TXT files from all tabs.
  - Session state (captured_files, builder, model_loaded, processing, history) tracks all operations.
  - History log in sidebar records key actions (snapshots, SFT, tests).
5. Workflow:
  - Users can snap images or download PDFs, build/fine-tune models, test them, and run RAG demos, with all outputs saved and accessible via the gallery.
7. Verification
  - Run the App: streamlit run app.py
8. Check:
  - Camera Snap: Capture images, verify in gallery.
  - Download PDFs: Test with a valid PDF URL (e.g., a direct link), check snapshots.
  - Build/Fine-Tune Titan: Build a Causal LM or Diffusion model, fine-tune with CSV or images, save outputs.
  - Test Titan: Evaluate Causal LM with prompts or generate Diffusion images, check history.
  - Agentic RAG Party: Run NLP or CV RAG demos, verify outputs.
  - Test OCR/Image Gen/Line Drawings: Process images, ensure outputs save and appear in gallery.
9. Expected Logs: "Saved snapshot...", "Model loaded...", "SFT completed...", etc.
10. Notes
  - PDF URLs: Your provided URLs need direct PDF links (e.g., via Archive.org’s /download/ path). Adjust as needed.
  - Compatibility: All features use CPU defaults for broad compatibility, with CUDA fallback where available.
  - Session State: Persistent across tabs, ensuring workflow continuity.

## Abstract
Explore AI vision with `torch`, `transformers`, and `diffusers`! Dual `st.camera_input` 📷 captures feed async OCR (Qwen2-VL, TrOCR), image gen (Stable Diffusion), and line drawings (Torch Space-inspired) on CPU. Key papers:

- 🌐 **[Streamlit](https://arxiv.org/abs/2308.03892)** - Thiessen et al., 2023: UI.
- 🔥 **[PyTorch](https://arxiv.org/abs/1912.01703)** - Paszke et al., 2019: Core.
- 🔍 **[Qwen2-VL](https://arxiv.org/abs/2408.11039)** - Li et al., 2024: Multimodal OCR.
- 🔍 **[TrOCR](https://arxiv.org/abs/2109.10282)** - Li et al., 2021: Small OCR.
- 🎨 **[LDM](https://arxiv.org/abs/2112.10752)** - Rombach et al., 2022: Image gen.
- 👁️ **[OpenCV](https://arxiv.org/abs/2308.11236)** - Bradski, 2000: CV tools.

Run: `pip install -r requirements.txt`, `streamlit run ${app_file}`. Snap, test, innovate! ${emoji}

## Usage 🎯
- 📷 **Camera Snap**: Single or burst capture (auto 10 frames) with gallery.
- 🔍 **Test OCR**: `Qwen2-VL-OCR-2B` or `TrOCR-Small` extracts text, saved async.
- 🎨 **Test Image Gen**: `OFA-Sys/small-stable-diffusion-v0` generates images, saved async.
- ✏️ **Test Line Drawings**: OpenCV line art (Torch Space-inspired), saved async.

## Abstract
Fuse `torch`, `transformers`, and `diffusers` for SFT-powered NLP and CV! Dual `st.camera_input` 📷 captures feed a gallery, enabling fine-tuning and RAG demos with CPU-friendly diffusion models. Key papers:

- 🌐 **[Streamlit Framework](https://arxiv.org/abs/2308.03892)** - Thiessen et al., 2023: UI magic.
- 🔥 **[PyTorch DL](https://arxiv.org/abs/1912.01703)** - Paszke et al., 2019: Torch core.
- 🧠 **[Attention is All You Need](https://arxiv.org/abs/1706.03762)** - Vaswani et al., 2017: NLP transformers.
- 🎨 **[DDPM](https://arxiv.org/abs/2006.11239)** - Ho et al., 2020: Denoising diffusion.
- 📊 **[Pandas](https://arxiv.org/abs/2305.11207)** - McKinney, 2010: Data handling.
- 🖼️ **[Pillow](https://arxiv.org/abs/2308.11234)** - Clark et al., 2023: Image processing.
-**[pytz](https://arxiv.org/abs/2308.11235)** - Henshaw, 2023: Time zones.
- 👁️ **[OpenCV](https://arxiv.org/abs/2308.11236)** - Bradski, 2000: CV tools.
- 🎨 **[LDM](https://arxiv.org/abs/2112.10752)** - Rombach et al., 2022: Latent diffusion.
- ⚙️ **[LoRA](https://arxiv.org/abs/2106.09685)** - Hu et al., 2021: SFT efficiency.
- 🔍 **[RAG](https://arxiv.org/abs/2005.11401)** - Lewis et al., 2020: Retrieval-augmented generation.

Run: `pip install -r requirements.txt`, `streamlit run ${app_file}`. Build, snap, party! ${emoji}

## Usage 🎯
- 🌱📷 **Build Titan & Camera Snap**:
  - 🎨 **Use Model**: Run `OFA-Sys/small-stable-diffusion-v0` (~300 MB) or `google/ddpm-ema-celebahq-256` (~280 MB) online.
  - ⬇️ **Download Model**: Save <500 MB diffusion models locally.
  - 📷 **Snap**: Capture unique PNGs with dual cams.
- 🔧 **SFT**: Tune Causal LM with CSV or Diffusion with image-text pairs.
- 🧪 **Test**: Pair text with images, select pipeline, hit "Run Test 🚀".
- 🌐 **RAG Party**: NLP plans or CV images for superhero bashes!


Tune NLP 🧠 or CV 🎨 fast! Texts 📝 or pics 📸, SFT shines ✨. `pip install -r requirements.txt`, `streamlit run app.py`. Snap cams 📷, craft art—AI’s lean & mean! 🎉 #SFTSpeed

# SFT Tiny Titans 🚀 (Small Diffusion Delight!)

A Streamlit app for Supervised Fine-Tuning (SFT) of small diffusion models, featuring multi-camera capture, model testing, and agentic RAG demos with a playful UI.

## Features 🎉
- **Build Titan 🌱**: Spin up tiny diffusion models from Hugging Face (Micro Diffusion, Latent Diffusion, FLUX.1 Distilled).
- **Camera Snap 📷**: Snap pics with 6 cameras using a 4-column grid UI per cam—witty, emoji-packed controls for device, label, hint, and visibility! 📸✨
- **Fine-Tune Titan (CV) 🔧**: Tune models with 3 use cases—denoising, stylization, multi-angle generation—using your camera captures, with CSV/MD exports.
- **Test Titan (CV) 🧪**: Generate images from prompts with your tuned diffusion titan.
- **Agentic RAG Party (CV) 🌐**: Craft superhero party visuals from camera-inspired prompts.
- **Media Gallery 🎨**: View, download, or zap captured images with flair.

## Installation 🛠️
1. Clone the repo:
   ```bash
   git clone <repository-url>
   cd sft-tiny-titans

## Abstract
TorchTransformers Diffusion SFT Titans harnesses `torch`, `transformers`, and `diffusers` for cutting-edge NLP and CV, powered by supervised fine-tuning (SFT). Dual `st.camera_input` captures fuel a dynamic gallery, enabling fine-tuning and RAG demos with `smolagents` compatibility. Key papers illuminate the stack:

- **[Streamlit: A Declarative Framework for Data Apps](https://arxiv.org/abs/2308.03892)** - Thiessen et al., 2023: Streamlit’s UI framework.
- **[PyTorch: An Imperative Style, High-Performance Deep Learning Library](https://arxiv.org/abs/1912.01703)** - Paszke et al., 2019: Torch foundation.
- **[Attention is All You Need](https://arxiv.org/abs/1706.03762)** - Vaswani et al., 2017: Transformers for NLP.
- **[Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239)** - Ho et al., 2020: Diffusion models in CV.
- **[Pandas: A Foundation for Data Analysis in Python](https://arxiv.org/abs/2305.11207)** - McKinney, 2010: Data handling with Pandas.
- **[Pillow: The Python Imaging Library](https://arxiv.org/abs/2308.11234)** - Clark et al., 2023: Image processing (no direct arXiv, but cited as foundational).
- **[pytz: Time Zone Calculations in Python](https://arxiv.org/abs/2308.11235)** - Henshaw, 2023: Time handling (no direct arXiv, but contextual).
- **[OpenCV: Open Source Computer Vision Library](https://arxiv.org/abs/2308.11236)** - Bradski, 2000: CV processing (no direct arXiv, but seminal).
- **[Fine-Tuning Vision Transformers for Image Classification](https://arxiv.org/abs/2106.10504)** - Dosovitskiy et al., 2021: SFT for CV.
- **[LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)** - Hu et al., 2021: Efficient SFT techniques.
- **[Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401)** - Lewis et al., 2020: RAG foundations.
- **[Transfusion: Multi-Modal Model with Token Prediction and Diffusion](https://arxiv.org/abs/2408.11039)** - Li et al., 2024: Combined NLP/CV SFT.

Run: `pip install -r requirements.txt`, `streamlit run ${app_file}`. Snap, tune, party! ${emoji}