Spaces:
Running
Running

davanstrien
HF Staff
Update README with new repositories (synthetic-data, deduplication, openai-oss)
261046a
title: README | |
emoji: π | |
colorFrom: red | |
colorTo: indigo | |
sdk: static | |
pinned: false | |
# UV Scripts | |
**Ready-to-run ML tools powered by UV - zero setup, maximum power** | |
Run state-of-the-art ML workflows with a single command. From OCR to classification, all scripts work instantly with `uv run`. | |
## What are UV scripts? | |
UV scripts are self-contained Python scripts that use [inline metadata](https://docs.astral.sh/uv/guides/scripts/) to specify dependencies. Just `uv run script.py` and everything installs automatically. | |
Perfect for: | |
- π **GPU workflows** on [HF Jobs](https://huggingface.co/docs/huggingface_hub/guides/jobs) | |
- π» **Local processing** on your machine | |
- π **Reproducible pipelines** that work anywhere | |
## π Quick Example | |
```bash | |
# Extract text from images with state-of-the-art OCR (no local GPU needed!) | |
hf jobs uv run --flavor l4x1 \ | |
https://huggingface.co/datasets/uv-scripts/ocr/raw/main/nanonets-ocr.py \ | |
your-images your-extracted-text | |
``` | |
## π Browse Scripts | |
| Script Collection | Description | GPU Required | | |
| ------------------------------------------------------------------------------- | --------------------------------------------------------- | ------------ | | |
| [ocr](https://huggingface.co/datasets/uv-scripts/ocr) | Extract text from images with VLMs (LaTeX, tables, forms) | β | | |
| [classification](https://huggingface.co/datasets/uv-scripts/classification) | Text classification with guaranteed valid outputs | β | | |
| [dataset-creation](https://huggingface.co/datasets/uv-scripts/dataset-creation) | Create datasets from PDFs and files | β | | |
| [vllm](https://huggingface.co/datasets/uv-scripts/vllm) | High-performance inference with vLLM | β | | |
| [synthetic-data](https://huggingface.co/datasets/uv-scripts/synthetic-data) | Generate high-quality synthetic data with CoT reasoning | β | | |
| [deduplication](https://huggingface.co/datasets/uv-scripts/deduplication) | Remove duplicates using semantic similarity | β | | |
| [openai-oss](https://huggingface.co/datasets/uv-scripts/openai-oss) | Generate responses with visible reasoning traces | β | | |
## π― Why UV Scripts? | |
### Zero Setup | |
No virtual environments, no dependency conflicts, no installation steps. UV handles everything automatically when you run the script. | |
### GPU Optimized | |
Seamlessly run on local GPUs or scale to cloud with [HF Jobs](https://huggingface.co/docs/huggingface_hub/guides/jobs). Same script, different compute. | |
## π Featured Scripts | |
### OCR Any Document Dataset | |
Extract text from images with state-of-the-art accuracy: | |
```bash | |
# Handles LaTeX, tables, forms, handwriting | |
hf jobs uv run --flavor l4x1 \ | |
https://huggingface.co/datasets/uv-scripts/ocr/raw/main/nanonets-ocr.py \ | |
your-images extracted-text | |
``` | |
### Deduplicate Datasets (CPU-Friendly!) | |
Remove duplicates using semantic similarity - no GPU needed: | |
```bash | |
# Fast semantic deduplication on CPU | |
uv run https://huggingface.co/datasets/uv-scripts/deduplication/raw/main/semantic-dedupe.py \ | |
your-dataset text your-dataset-clean \ | |
--method duplicates --threshold 0.9 | |
``` | |
### Generate Synthetic Training Data | |
Create high-quality synthetic data with chain-of-thought reasoning: | |
```bash | |
# Generate synthetic math problems with reasoning | |
hf jobs uv run --flavor l4x1 \ | |
https://huggingface.co/datasets/uv-scripts/synthetic-data/raw/main/cot-self-instruct.py \ | |
--seed-dataset math-examples --output-dataset synthetic-math \ | |
--task-type reasoning --num-samples 1000 | |
``` | |
## π Getting Started with HF Jobs | |
Run any UV script on GPU infrastructure: | |
```bash | |
hf jobs uv run --flavor l4x1 \ | |
https://huggingface.co/datasets/uv-scripts/[collection]/raw/main/[script].py \ | |
[args] | |
``` | |
Choose your GPU flavor: | |
- `l4x1` - Good balance for most tasks | |
- `a10g-large` - More memory for larger models | |
- `a100-large` - Maximum performance | |
## π Learn More | |
- [UV Documentation](https://docs.astral.sh/uv/) | |
- [HF Jobs Guide](https://huggingface.co/docs/huggingface_hub/guides/jobs) | |
- [Script Examples](https://github.com/astral-sh/uv/tree/main/scripts) | |
--- | |
_UV Scripts is a community project showcasing the power of [UV](https://github.com/astral-sh/uv) for ML workflows._ | |