metadata

title: TorchTransformers Diffusion CV SFT
emoji: ⚡
colorFrom: yellow
colorTo: indigo
sdk: streamlit
sdk_version: 1.43.2
app_file: app.py
pinned: false
license: mit
short_description: Torch Transformers Diffusion SFT for Computer Vision

Abstract

Harness torch, transformers, and diffusers for SFT-powered NLP and CV! Dual st.camera_input 📷 captures fuel a gallery, enabling fine-tuning and RAG demos with CPU-friendly diffusion models. Key papers:

🌐 Streamlit: A Declarative Framework - Thiessen et al., 2023: UI magic.
🔥 PyTorch: High-Performance DL - Paszke et al., 2019: Torch core.
🧠 Attention is All You Need - Vaswani et al., 2017: NLP transformers.
🎨 Denoising Diffusion Probabilistic Models - Ho et al., 2020: DDPM foundation.
📊 Pandas: Data Analysis in Python - McKinney, 2010: Data handling.
🖼️ Pillow: Python Imaging - Clark et al., 2023: Image processing.
⏰ pytz: Time Zone Calculations - Henshaw, 2023: Time zones.
👁️ OpenCV: Computer Vision - Bradski, 2000: CV tools.
🎨 Latent Diffusion Models - Rombach et al., 2022: Efficient CV.
⚙️ LoRA: Low-Rank Adaptation - Hu et al., 2021: SFT efficiency.
🔍 Retrieval-Augmented Generation - Lewis et al., 2020: RAG base.

Run: pip install -r requirements.txt, streamlit run ${app_file}. Snap, tune, party! ${emoji}

Usage 🎯

📷 Camera Snap: Capture pics with dual cams, save PNGs.
- Single: Click "Take a picture".
- Burst: Set slice count, click "Capture X Frames 📸".
🔧 SFT: Fine-tune Causal LM with CSV or Diffusion with image-text pairs.
🌱 Build: Load CPU diffusion models:
- 🎨 OFA-Sys/small-stable-diffusion-v0 (~300 MB, LDM/Conditional).
- 🌫️ google/ddpm-ema-celebahq-256 (~280 MB, DDPM/SDE/Autoregressive Proxy).
🧪 Test: Pair text with images, pick pipeline, hit "Run Test 🚀".
🌐 RAG Party: NLP plans or CV images for superhero bashes!

Tune NLP 🧠 or CV 🎨 fast! Texts 📝 or pics 📸, SFT shines ✨. pip install -r requirements.txt, streamlit run app.py. Snap cams 📷, craft art—AI’s lean & mean! 🎉 #SFTSpeed

SFT Tiny Titans 🚀 (Small Diffusion Delight!)

A Streamlit app for Supervised Fine-Tuning (SFT) of small diffusion models, featuring multi-camera capture, model testing, and agentic RAG demos with a playful UI.

Features 🎉

Build Titan 🌱: Spin up tiny diffusion models from Hugging Face (Micro Diffusion, Latent Diffusion, FLUX.1 Distilled).
Camera Snap 📷: Snap pics with 6 cameras using a 4-column grid UI per cam—witty, emoji-packed controls for device, label, hint, and visibility! 📸✨
Fine-Tune Titan (CV) 🔧: Tune models with 3 use cases—denoising, stylization, multi-angle generation—using your camera captures, with CSV/MD exports.
Test Titan (CV) 🧪: Generate images from prompts with your tuned diffusion titan.
Agentic RAG Party (CV) 🌐: Craft superhero party visuals from camera-inspired prompts.
Media Gallery 🎨: View, download, or zap captured images with flair.

Installation 🛠️

Clone the repo:

git clone <repository-url>
cd sft-tiny-titans

Abstract

TorchTransformers Diffusion SFT Titans harnesses torch, transformers, and diffusers for cutting-edge NLP and CV, powered by supervised fine-tuning (SFT). Dual st.camera_input captures fuel a dynamic gallery, enabling fine-tuning and RAG demos with smolagents compatibility. Key papers illuminate the stack:

Streamlit: A Declarative Framework for Data Apps - Thiessen et al., 2023: Streamlit’s UI framework.
PyTorch: An Imperative Style, High-Performance Deep Learning Library - Paszke et al., 2019: Torch foundation.
Attention is All You Need - Vaswani et al., 2017: Transformers for NLP.
Denoising Diffusion Probabilistic Models - Ho et al., 2020: Diffusion models in CV.
Pandas: A Foundation for Data Analysis in Python - McKinney, 2010: Data handling with Pandas.
Pillow: The Python Imaging Library - Clark et al., 2023: Image processing (no direct arXiv, but cited as foundational).
pytz: Time Zone Calculations in Python - Henshaw, 2023: Time handling (no direct arXiv, but contextual).
OpenCV: Open Source Computer Vision Library - Bradski, 2000: CV processing (no direct arXiv, but seminal).
Fine-Tuning Vision Transformers for Image Classification - Dosovitskiy et al., 2021: SFT for CV.
LoRA: Low-Rank Adaptation of Large Language Models - Hu et al., 2021: Efficient SFT techniques.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks - Lewis et al., 2020: RAG foundations.
Transfusion: Multi-Modal Model with Token Prediction and Diffusion - Li et al., 2024: Combined NLP/CV SFT.

Run: pip install -r requirements.txt, streamlit run ${app_file}. Snap, tune, party! ${emoji}