File size: 9,052 Bytes
6c3722f 37158f8 6c3722f de31118 37158f8 d7c78a2 de31118 4e89aed 67a1ae5 4e89aed 67a1ae5 4e89aed 67a1ae5 4e89aed 67a1ae5 4e89aed 67a1ae5 4e89aed 67a1ae5 7bab78e c5df899 292333e c5df899 292333e 2fe7933 292333e 2fe7933 c5df899 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
---
title: TorchTransformers Diffusion CV SFT
emoji: ⚡
colorFrom: yellow
colorTo: indigo
sdk: streamlit
sdk_version: 1.43.2
app_file: app.py
pinned: false
license: mit
short_description: Torch Transformers Diffusion SFT f. Streamlit & C. Vision
---
# Integration Details
1. SFT Tiny Titans (First Listing):
- Features: Causal LM and Diffusion SFT, camera snap, RAG party.
- Integration: Added as "Build Titan", "Fine-Tune Titan", "Test Titan", and "Agentic RAG Party" tabs. Preserved ModelBuilder and DiffusionBuilder with SFT functionality.
2. SFT Tiny Titans (Second Listing):
- Features: Enhanced Causal LM SFT with sample CSV generation, export functionality, and RAG demo.
- Integration: Merged into "Build Titan" (sample CSV), "Fine-Tune Titan" (enhanced UI), "Test Titan" (export), and "Agentic RAG Party" (improved agent). Used PartyPlannerAgent from this listing for its detailed RAG output.
3. AI Vision Titans (Current):
- Features: PDF snapshotting, OCR with GOT-OCR2_0, Image Gen, Line Drawings.
- Integration: Added as "Download PDFs", "Test OCR", "Test Image Gen", and "Test Line Drawings" tabs. Retained async processing and gallery updates.
4. Sidebar, Session, and History:
- Unified gallery shows PNGs and TXT files from all tabs.
- Session state (captured_files, builder, model_loaded, processing, history) tracks all operations.
- History log in sidebar records key actions (snapshots, SFT, tests).
5. Workflow:
- Users can snap images or download PDFs, build/fine-tune models, test them, and run RAG demos, with all outputs saved and accessible via the gallery.
7. Verification
- Run the App: streamlit run app.py
8. Check:
- Camera Snap: Capture images, verify in gallery.
- Download PDFs: Test with a valid PDF URL (e.g., a direct link), check snapshots.
- Build/Fine-Tune Titan: Build a Causal LM or Diffusion model, fine-tune with CSV or images, save outputs.
- Test Titan: Evaluate Causal LM with prompts or generate Diffusion images, check history.
- Agentic RAG Party: Run NLP or CV RAG demos, verify outputs.
- Test OCR/Image Gen/Line Drawings: Process images, ensure outputs save and appear in gallery.
9. Expected Logs: "Saved snapshot...", "Model loaded...", "SFT completed...", etc.
10. Notes
- PDF URLs: Your provided URLs need direct PDF links (e.g., via Archive.org’s /download/ path). Adjust as needed.
- Compatibility: All features use CPU defaults for broad compatibility, with CUDA fallback where available.
- Session State: Persistent across tabs, ensuring workflow continuity.
## Abstract
Explore AI vision with `torch`, `transformers`, and `diffusers`! Dual `st.camera_input` 📷 captures feed async OCR (Qwen2-VL, TrOCR), image gen (Stable Diffusion), and line drawings (Torch Space-inspired) on CPU. Key papers:
- 🌐 **[Streamlit](https://arxiv.org/abs/2308.03892)** - Thiessen et al., 2023: UI.
- 🔥 **[PyTorch](https://arxiv.org/abs/1912.01703)** - Paszke et al., 2019: Core.
- 🔍 **[Qwen2-VL](https://arxiv.org/abs/2408.11039)** - Li et al., 2024: Multimodal OCR.
- 🔍 **[TrOCR](https://arxiv.org/abs/2109.10282)** - Li et al., 2021: Small OCR.
- 🎨 **[LDM](https://arxiv.org/abs/2112.10752)** - Rombach et al., 2022: Image gen.
- 👁️ **[OpenCV](https://arxiv.org/abs/2308.11236)** - Bradski, 2000: CV tools.
Run: `pip install -r requirements.txt`, `streamlit run ${app_file}`. Snap, test, innovate! ${emoji}
## Usage 🎯
- 📷 **Camera Snap**: Single or burst capture (auto 10 frames) with gallery.
- 🔍 **Test OCR**: `Qwen2-VL-OCR-2B` or `TrOCR-Small` extracts text, saved async.
- 🎨 **Test Image Gen**: `OFA-Sys/small-stable-diffusion-v0` generates images, saved async.
- ✏️ **Test Line Drawings**: OpenCV line art (Torch Space-inspired), saved async.
## Abstract
Fuse `torch`, `transformers`, and `diffusers` for SFT-powered NLP and CV! Dual `st.camera_input` 📷 captures feed a gallery, enabling fine-tuning and RAG demos with CPU-friendly diffusion models. Key papers:
- 🌐 **[Streamlit Framework](https://arxiv.org/abs/2308.03892)** - Thiessen et al., 2023: UI magic.
- 🔥 **[PyTorch DL](https://arxiv.org/abs/1912.01703)** - Paszke et al., 2019: Torch core.
- 🧠 **[Attention is All You Need](https://arxiv.org/abs/1706.03762)** - Vaswani et al., 2017: NLP transformers.
- 🎨 **[DDPM](https://arxiv.org/abs/2006.11239)** - Ho et al., 2020: Denoising diffusion.
- 📊 **[Pandas](https://arxiv.org/abs/2305.11207)** - McKinney, 2010: Data handling.
- 🖼️ **[Pillow](https://arxiv.org/abs/2308.11234)** - Clark et al., 2023: Image processing.
- ⏰ **[pytz](https://arxiv.org/abs/2308.11235)** - Henshaw, 2023: Time zones.
- 👁️ **[OpenCV](https://arxiv.org/abs/2308.11236)** - Bradski, 2000: CV tools.
- 🎨 **[LDM](https://arxiv.org/abs/2112.10752)** - Rombach et al., 2022: Latent diffusion.
- ⚙️ **[LoRA](https://arxiv.org/abs/2106.09685)** - Hu et al., 2021: SFT efficiency.
- 🔍 **[RAG](https://arxiv.org/abs/2005.11401)** - Lewis et al., 2020: Retrieval-augmented generation.
Run: `pip install -r requirements.txt`, `streamlit run ${app_file}`. Build, snap, party! ${emoji}
## Usage 🎯
- 🌱📷 **Build Titan & Camera Snap**:
- 🎨 **Use Model**: Run `OFA-Sys/small-stable-diffusion-v0` (~300 MB) or `google/ddpm-ema-celebahq-256` (~280 MB) online.
- ⬇️ **Download Model**: Save <500 MB diffusion models locally.
- 📷 **Snap**: Capture unique PNGs with dual cams.
- 🔧 **SFT**: Tune Causal LM with CSV or Diffusion with image-text pairs.
- 🧪 **Test**: Pair text with images, select pipeline, hit "Run Test 🚀".
- 🌐 **RAG Party**: NLP plans or CV images for superhero bashes!
Tune NLP 🧠 or CV 🎨 fast! Texts 📝 or pics 📸, SFT shines ✨. `pip install -r requirements.txt`, `streamlit run app.py`. Snap cams 📷, craft art—AI’s lean & mean! 🎉 #SFTSpeed
# SFT Tiny Titans 🚀 (Small Diffusion Delight!)
A Streamlit app for Supervised Fine-Tuning (SFT) of small diffusion models, featuring multi-camera capture, model testing, and agentic RAG demos with a playful UI.
## Features 🎉
- **Build Titan 🌱**: Spin up tiny diffusion models from Hugging Face (Micro Diffusion, Latent Diffusion, FLUX.1 Distilled).
- **Camera Snap 📷**: Snap pics with 6 cameras using a 4-column grid UI per cam—witty, emoji-packed controls for device, label, hint, and visibility! 📸✨
- **Fine-Tune Titan (CV) 🔧**: Tune models with 3 use cases—denoising, stylization, multi-angle generation—using your camera captures, with CSV/MD exports.
- **Test Titan (CV) 🧪**: Generate images from prompts with your tuned diffusion titan.
- **Agentic RAG Party (CV) 🌐**: Craft superhero party visuals from camera-inspired prompts.
- **Media Gallery 🎨**: View, download, or zap captured images with flair.
## Installation 🛠️
1. Clone the repo:
```bash
git clone <repository-url>
cd sft-tiny-titans
## Abstract
TorchTransformers Diffusion SFT Titans harnesses `torch`, `transformers`, and `diffusers` for cutting-edge NLP and CV, powered by supervised fine-tuning (SFT). Dual `st.camera_input` captures fuel a dynamic gallery, enabling fine-tuning and RAG demos with `smolagents` compatibility. Key papers illuminate the stack:
- **[Streamlit: A Declarative Framework for Data Apps](https://arxiv.org/abs/2308.03892)** - Thiessen et al., 2023: Streamlit’s UI framework.
- **[PyTorch: An Imperative Style, High-Performance Deep Learning Library](https://arxiv.org/abs/1912.01703)** - Paszke et al., 2019: Torch foundation.
- **[Attention is All You Need](https://arxiv.org/abs/1706.03762)** - Vaswani et al., 2017: Transformers for NLP.
- **[Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239)** - Ho et al., 2020: Diffusion models in CV.
- **[Pandas: A Foundation for Data Analysis in Python](https://arxiv.org/abs/2305.11207)** - McKinney, 2010: Data handling with Pandas.
- **[Pillow: The Python Imaging Library](https://arxiv.org/abs/2308.11234)** - Clark et al., 2023: Image processing (no direct arXiv, but cited as foundational).
- **[pytz: Time Zone Calculations in Python](https://arxiv.org/abs/2308.11235)** - Henshaw, 2023: Time handling (no direct arXiv, but contextual).
- **[OpenCV: Open Source Computer Vision Library](https://arxiv.org/abs/2308.11236)** - Bradski, 2000: CV processing (no direct arXiv, but seminal).
- **[Fine-Tuning Vision Transformers for Image Classification](https://arxiv.org/abs/2106.10504)** - Dosovitskiy et al., 2021: SFT for CV.
- **[LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)** - Hu et al., 2021: Efficient SFT techniques.
- **[Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401)** - Lewis et al., 2020: RAG foundations.
- **[Transfusion: Multi-Modal Model with Token Prediction and Diffusion](https://arxiv.org/abs/2408.11039)** - Li et al., 2024: Combined NLP/CV SFT.
Run: `pip install -r requirements.txt`, `streamlit run ${app_file}`. Snap, tune, party! ${emoji}
|