README / README.md
davanstrien's picture
davanstrien HF Staff
Update README with new repositories (synthetic-data, deduplication, openai-oss)
261046a
---
title: README
emoji: πŸ“š
colorFrom: red
colorTo: indigo
sdk: static
pinned: false
---
# UV Scripts
**Ready-to-run ML tools powered by UV - zero setup, maximum power**
Run state-of-the-art ML workflows with a single command. From OCR to classification, all scripts work instantly with `uv run`.
## What are UV scripts?
UV scripts are self-contained Python scripts that use [inline metadata](https://docs.astral.sh/uv/guides/scripts/) to specify dependencies. Just `uv run script.py` and everything installs automatically.
Perfect for:
- πŸš€ **GPU workflows** on [HF Jobs](https://huggingface.co/docs/huggingface_hub/guides/jobs)
- πŸ’» **Local processing** on your machine
- πŸ”„ **Reproducible pipelines** that work anywhere
## πŸš€ Quick Example
```bash
# Extract text from images with state-of-the-art OCR (no local GPU needed!)
hf jobs uv run --flavor l4x1 \
https://huggingface.co/datasets/uv-scripts/ocr/raw/main/nanonets-ocr.py \
your-images your-extracted-text
```
## πŸ“š Browse Scripts
| Script Collection | Description | GPU Required |
| ------------------------------------------------------------------------------- | --------------------------------------------------------- | ------------ |
| [ocr](https://huggingface.co/datasets/uv-scripts/ocr) | Extract text from images with VLMs (LaTeX, tables, forms) | βœ… |
| [classification](https://huggingface.co/datasets/uv-scripts/classification) | Text classification with guaranteed valid outputs | βœ… |
| [dataset-creation](https://huggingface.co/datasets/uv-scripts/dataset-creation) | Create datasets from PDFs and files | ❌ |
| [vllm](https://huggingface.co/datasets/uv-scripts/vllm) | High-performance inference with vLLM | βœ… |
| [synthetic-data](https://huggingface.co/datasets/uv-scripts/synthetic-data) | Generate high-quality synthetic data with CoT reasoning | βœ… |
| [deduplication](https://huggingface.co/datasets/uv-scripts/deduplication) | Remove duplicates using semantic similarity | ❌ |
| [openai-oss](https://huggingface.co/datasets/uv-scripts/openai-oss) | Generate responses with visible reasoning traces | βœ… |
## 🎯 Why UV Scripts?
### Zero Setup
No virtual environments, no dependency conflicts, no installation steps. UV handles everything automatically when you run the script.
### GPU Optimized
Seamlessly run on local GPUs or scale to cloud with [HF Jobs](https://huggingface.co/docs/huggingface_hub/guides/jobs). Same script, different compute.
## 🌟 Featured Scripts
### OCR Any Document Dataset
Extract text from images with state-of-the-art accuracy:
```bash
# Handles LaTeX, tables, forms, handwriting
hf jobs uv run --flavor l4x1 \
https://huggingface.co/datasets/uv-scripts/ocr/raw/main/nanonets-ocr.py \
your-images extracted-text
```
### Deduplicate Datasets (CPU-Friendly!)
Remove duplicates using semantic similarity - no GPU needed:
```bash
# Fast semantic deduplication on CPU
uv run https://huggingface.co/datasets/uv-scripts/deduplication/raw/main/semantic-dedupe.py \
your-dataset text your-dataset-clean \
--method duplicates --threshold 0.9
```
### Generate Synthetic Training Data
Create high-quality synthetic data with chain-of-thought reasoning:
```bash
# Generate synthetic math problems with reasoning
hf jobs uv run --flavor l4x1 \
https://huggingface.co/datasets/uv-scripts/synthetic-data/raw/main/cot-self-instruct.py \
--seed-dataset math-examples --output-dataset synthetic-math \
--task-type reasoning --num-samples 1000
```
## πŸš€ Getting Started with HF Jobs
Run any UV script on GPU infrastructure:
```bash
hf jobs uv run --flavor l4x1 \
https://huggingface.co/datasets/uv-scripts/[collection]/raw/main/[script].py \
[args]
```
Choose your GPU flavor:
- `l4x1` - Good balance for most tasks
- `a10g-large` - More memory for larger models
- `a100-large` - Maximum performance
## πŸ“– Learn More
- [UV Documentation](https://docs.astral.sh/uv/)
- [HF Jobs Guide](https://huggingface.co/docs/huggingface_hub/guides/jobs)
- [Script Examples](https://github.com/astral-sh/uv/tree/main/scripts)
---
_UV Scripts is a community project showcasing the power of [UV](https://github.com/astral-sh/uv) for ML workflows._