awacke1's picture
Update README.md
8bd86ec verified
|
raw
history blame
12.1 kB
---
title: TorchTransformers Diffusion CV SFT
emoji: โšก
colorFrom: yellow
colorTo: indigo
sdk: streamlit
sdk_version: 1.43.2
app_file: app.py
pinned: false
license: mit
short_description: Torch Transformers Diffusion SFT f. Streamlit & C. Vision
---
# TorchTransformers Diffusion CV SFT Titans ๐Ÿš€
A Streamlit app blending `torch`, `transformers`, and `diffusers` for vision and NLP fun! Snap PDFs ๐Ÿ“„, turn them into double-page spreads ๐Ÿ–ผ๏ธ, extract text with GPT ๐Ÿค–, and craft emoji-packed Markdown outlines ๐Ÿ“โ€”all with a witty UI and CPU-friendly SFT.
## Integration Details
1. **SFT Tiny Titans (First Listing)**:
- Features: Causal LM and Diffusion SFT, camera snap, RAG party.
- Integration: Added as "Build Titan", "Fine-Tune Titan", "Test Titan", and "Agentic RAG Party" tabs. Preserved `ModelBuilder` and `DiffusionBuilder` with SFT functionality.
2. **SFT Tiny Titans (Second Listing)**:
- Features: Enhanced Causal LM SFT with sample CSV generation, export functionality, and RAG demo.
- Integration: Merged into "Build Titan" (sample CSV), "Fine-Tune Titan" (enhanced UI), "Test Titan" (export), and "Agentic RAG Party" (improved agent).
3. **AI Vision Titans (Current)**:
- Features: PDF snapshotting, OCR with GOT-OCR2_0, Image Gen, GPT-based text extraction.
- Integration: Added as "Download PDFs", "Test OCR", "Test Image Gen", "PDF Process", "Image Process", and "MD Gallery" tabs. Retained async processing and gallery updates.
4. **Sidebar, Session, and History**:
- Unified gallery shows PNGs, PDFs, and MD files from all tabs.
- Session state (`captured_files`, `builder`, `model_loaded`, `processing`, `history`) tracks all operations.
- History log in sidebar records key actions (snapshots, SFT, tests).
5. **Workflow**:
- Snap images or download PDFs, snapshot to double-page spreads, extract text with GPT, summarize into emoji outlinesโ€”all saved in the gallery.
6. **Verification**:
- Run: `streamlit run app.py`
- Check: Camera snaps, PDF downloads, GPT text extraction, and Markdown outlines in gallery.
7. **Notes**:
- PDF URLs need direct links (e.g., arXivโ€™s `/pdf/` path).
- CPU defaults with CUDA fallback for broad compatibility.
## Abstract
Fuse `torch`, `transformers`, and `diffusers` with GPT vision for a wild AI ride! Dual `st.camera_input` ๐Ÿ“ท and PDF downloads ๐Ÿ“„ feed a gallery, powering GOT-OCR2_0 ๐Ÿ”, Stable Diffusion ๐ŸŽจ, and GPT text extraction ๐Ÿค–. Key papers:
- ๐ŸŒ **[Streamlit Framework](https://arxiv.org/abs/2308.03892)** - Thiessen et al., 2023: UI magic.
- ๐Ÿ”ฅ **[PyTorch DL](https://arxiv.org/abs/1912.01703)** - Paszke et al., 2019: Torch core.
- ๐Ÿง  **[Attention is All You Need](https://arxiv.org/abs/1706.03762)** - Vaswani et al., 2017: NLP transformers.
- ๐ŸŽจ **[Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239)** - Ho et al., 2020: Diffusion basics.
- ๐Ÿ” **[GOT: General OCR Theory](https://arxiv.org/abs/2408.11039)** - Li et al., 2024: Advanced OCR.
- ๐ŸŽจ **[Latent Diffusion Models](https://arxiv.org/abs/2112.10752)** - Rombach et al., 2022: Image generation.
- โš™๏ธ **[LoRA: Low-Rank Adaptation](https://arxiv.org/abs/2106.09685)** - Hu et al., 2021: SFT efficiency.
- ๐Ÿ” **[RAG: Retrieval-Augmented Generation](https://arxiv.org/abs/2005.11401)** - Lewis et al., 2020: RAG foundations.
- ๐Ÿ‘๏ธ **[Vision Transformers](https://arxiv.org/abs/2010.11929)** - Dosovitskiy et al., 2020: Vision backbone.
- ๐Ÿ“ **[GPT-4 Technical Report](https://arxiv.org/abs/2303.08774)** - OpenAI, 2023: GPT power.
- ๐Ÿ–ผ๏ธ **[CLIP: Learning Transferable Visual Models](https://arxiv.org/abs/2103.00020)** - Radford et al., 2021: Vision-language bridge.
- โฐ **[Time Zone Handling in Python](https://arxiv.org/abs/2308.11235)** - Henshaw, 2023: `pytz` context.
Run: `pip install -r requirements.txt`, `streamlit run app.py`. Snap, process, summarize! โšก
## Usage ๐ŸŽฏ
- ๐Ÿ“ท **Camera Snap**: Capture pics with dual cams.
- ๐Ÿ“ฅ **Download PDFs**: Fetch papers (e.g., arXiv links below).
- ๐Ÿ“„ **PDF Process**: Snapshot to double-page spreads, extract text with GPT.
- ๐Ÿ–ผ๏ธ **Image Process**: OCR images with GPT vision.
- ๐Ÿ“š **MD Gallery**: Summarize Markdown files into emoji outlines.
## Tutorial: Single to Double Page Emoji Outlines
### Single Page Outline: Key Functions in `app.py`
| **Function** | **Purpose** ๐ŸŽฏ | **How It Works** ๐Ÿ› ๏ธ | **Emoji Insight** ๐Ÿ˜Ž |
|----------------------------|---------------------------------------------|--------------------------------------------------|-------------------------------|
| `generate_filename` | Unique file names ๐Ÿ“… | Adds timestamp to sequence | ๐Ÿ•ฐ๏ธ Timeโ€™s your file buddy! |
| `pdf_url_to_filename` | Safe PDF names ๐Ÿ–‹๏ธ | Cleans URLs to underscores | ๐Ÿšซ No URL mess! |
| `get_download_link` | Downloadable files โฌ‡๏ธ | Base64-encodes for HTML links | ๐Ÿ“ฆ Grab it, go! |
| `download_pdf` | Web PDF snatcher ๐ŸŒ | Fetches PDFs with `requests` | ๐Ÿ“š PDF pirate ahoy! |
| `process_pdf_snapshot` | PDF to images ๐Ÿ–ผ๏ธ | Async snapshots (single/double/all) with `fitz` | ๐Ÿ“ธ Double-page dazzle! |
| `process_ocr` | Image text extractor ๐Ÿ” | Async GOT-OCR2_0 with `transformers` | ๐Ÿ‘€ Text ninja strikes! |
| `process_image_gen` | Prompt to image ๐ŸŽจ | Async Stable Diffusion with `diffusers` | ๐Ÿ–Œ๏ธ Art from wordsโ€”bam! |
| `process_image_with_prompt`| GPT image analysis ๐Ÿค– | Base64 to GPT vision | ๐Ÿง  GPT sees all! |
| `process_text_with_prompt` | GPT text summarizer โœ๏ธ | Text to GPT for outlining | ๐Ÿ“ Summarize like a pro! |
| `update_gallery` | File showcase ๐Ÿ–ผ๏ธ๐Ÿ“– | Sidebar display with delete options | ๐ŸŒŸ Your creations shine! |
### Double Page Outline: Libraries in `requirements.txt`
| **Library** | **Single Page Purpose** ๐ŸŽฏ | **Double Page Usage** ๐Ÿ› ๏ธ | **Emoji Insight** ๐Ÿ˜Ž |
|---------------|-------------------------------------------|----------------------------------------------------|-------------------------------|
| `streamlit` | App UI ๐ŸŒ | Tabs like โ€œPDF Process ๐Ÿ“„โ€ and โ€œMD Gallery ๐Ÿ“šโ€ | ๐ŸŽฌ App starโ€”lights, action! |
| `pandas` | Data crunching ๐Ÿ“ˆ | Ready for OCR/metadata tables | ๐Ÿ“Š Table tamer awaits! |
| `torch` | ML engine ๐Ÿ”ฅ | Powers `transformers` and `diffusers` | ๐Ÿ”ฅ AIโ€™s fiery heart! |
| `requests` | Web grabber ๐ŸŒ | Downloads PDFs in `download_pdf` | ๐ŸŒ Web loot collector! |
| `aiofiles` | Fast file ops โšก | Async writes in `process_ocr` | โœˆ๏ธ File speed demon! |
| `pillow` | Image magic ๐Ÿ–Œ๏ธ | PDF to image in `process_pdf_snapshot` | ๐Ÿ–ผ๏ธ Pixel Picasso! |
| `PyMuPDF` | PDF handler ๐Ÿ“œ | Snapshots in `process_pdf_snapshot` | ๐Ÿ“œ PDF scroll master! |
| `transformers`| AI models ๐Ÿ—ฃ๏ธ | GOT-OCR2_0 in `process_ocr` | ๐Ÿค– Brain in a box! |
| `diffusers` | Image gen ๐ŸŽจ | Stable Diffusion in `process_image_gen` | ๐ŸŽจ Art generator supreme! |
| `openai` | GPT vision/text ๐Ÿค– | Image/text processing in GPT functions | ๐ŸŒŒ All-seeing AI oracle! |
| `glob2` | File finder ๐Ÿ” | Gallery files in `update_gallery` | ๐Ÿ•ต๏ธ File sleuth! |
| `pytz` | Time zones โฐ | Timestamps in `generate_filename` | โณ Time wizard! |
## Automation Instructions: Witty & Funny Steps ๐Ÿ˜‚
1. **Load PDFs** ๐Ÿ“š
- Drop URLs into โ€œDownload PDFs ๐Ÿ“ฅโ€ or upload files.
- *Emoji Tip*: ๐Ÿฆ Unleash the PDF beastโ€”roar through arXiv!
2. **Double-Page Snap** ๐Ÿ“ธ
- Click โ€œSnapshot Selected ๐Ÿ“ธโ€ with โ€œTwo Pages (High-Res)โ€โ€”landscape glory!
- *Witty Note*: Two pages > one, because who reads half a comic? ๐Ÿฆธ
3. **GPT Vision Zap** โšก
- In โ€œPDF Process ๐Ÿ“„โ€, pick a GPT model (e.g., `gpt-4o-mini`) and zap text out.
- *Funny Bit*: GPTโ€™s like โ€œI see text, mortals!โ€ ๐Ÿ‘๏ธ
4. **Markdown Mash** ๐Ÿ“
- โ€œMD Gallery ๐Ÿ“šโ€ takes Markdown files, smashes them into a 12-point emoji outline.
- *Sassy Tip*: 12 pointsโ€”because 11โ€™s weak and 13โ€™s overkill! ๐Ÿ˜œ
## Innovative Features ๐ŸŒŸ
- **Double-Page Spreads**: High-res, landscape images from PDFsโ€”perfect for apps! ๐Ÿ–ฅ๏ธ
- **GPT Model Picker**: Swap `gpt-4o` for `gpt-4o-mini`โ€”speed vs. smarts! โšก๐Ÿง 
- **12-Point Emoji Outline**: Clusters facts into 12 witty sectionsโ€”e.g., โ€œ1. Heroes ๐Ÿฆธโ€, โ€œ2. Tech ๐Ÿ”งโ€. ๐ŸŽ‰
## Mermaid Process Flow ๐Ÿงœโ€โ™€๏ธ
```mermaid
graph TD
A[๐Ÿ“š PDFs] -->|๐Ÿ“ฅ Download| B[๐Ÿ“„ PDF Process]
B -->|๐Ÿ“ธ Snapshot| C[๐Ÿ–ผ๏ธ Double-Page Images]
C -->|๐Ÿค– GPT Vision| D[๐Ÿ“ Markdown Files]
D -->|๐Ÿ“š MD Gallery| E[โœ๏ธ 12-Point Emoji Outline]
A:::pdf
B:::process
C:::image
D:::markdown
E:::outline
classDef pdf fill:#f9f,stroke:#333,stroke-width:2px;
classDef process fill:#bbf,stroke:#333,stroke-width:2px;
classDef image fill:#bfb,stroke:#333,stroke-width:2px;
classDef markdown fill:#ffb,stroke:#333,stroke-width:2px;
classDef outline fill:#fbf,stroke:#333,stroke-width:2px;
```
Flow Explained:
1. ๐Ÿ“š PDFs: Start with one or more PDFs on a topic.
2. ๐Ÿ“„ PDF Process: Download and snapshot into high-res double-page spreads.
3. ๐Ÿ–ผ๏ธ Double-Page Images: Landscape images ideal for apps, processed by GPT.
4. ๐Ÿ“ Markdown Files: Text extracted per document, saved as Markdown.
5. โœ๏ธ 12-Point Emoji Outline: Combines Markdown files into a 12-section summary (e.g., โ€œ1. Context ๐Ÿ“œโ€, โ€œ2. Methods ๐Ÿ”ฌโ€, ..., โ€œ12. Future ๐Ÿš€โ€).
Run: pip install -r requirements.txt, streamlit run app.py. Snap, process, outlineโ€”AI magic! โšก
---
### Key Updates
1. **Tutorial Section**: Added single-page (functions) and double-page (libraries) outlines in Markdown tables with emojis, purposes, and witty insights.
2. **Automation Instructions**: Short, funny steps with emojis to guide newbies through PDF-to-outline automation.
3. **Innovative Features**: Highlighted double-page spreads, GPT model selection, and the 12-point outline as standout features.
4. **Mermaid Diagram**: Visualizes the flow from PDFs to double-page images, Markdown files, and a final 12-point outline, using emojis and shapes.
5. **Updated arXiv Links**: Refreshed to match current functionality (vision, OCR, GPT, diffusion):
- Added GOT-OCR2_0, Vision Transformers, GPT-4, and CLIP papers.
- Kept core papers (Streamlit, PyTorch, etc.) and adjusted for relevance.
### How to Use
- Save this as `README.md` in your project folder.
- View it in a Markdown renderer (e.g., GitHub, VS Code) to see tables and Mermaid diagram rendered.
- Follow the automation steps to process PDFs and generate outlinesโ€”perfect for learners exploring AI vision and text summarization!
This README now serves as both a project overview and a tutorial, making it a fun, educational asset for all! ๐Ÿš€