starvector-1b-im2svg

Running

File size: 4,103 Bytes

72f684c

# StarVector Validation

This module provides validation functionality for StarVector models, allowing evaluation of SVG generation quality across different model architectures and generation backends.

## Overview

The validation framework consists of:

1. A base `SVGValidator` class that handles common validation logic
2. Specific validator implementations for different backends:
   - `StarVectorHFSVGValidator`: Uses HuggingFace generation API
   - `StarVectorVLLMValidator`: Uses vLLM for faster generation
   - `StarVectorVLLMAPIValidator`: Uses vLLM through REST API

## 1. Running Validation

### Using HuggingFace Backend

```bash
# StarVector-1B
python starvector/validation/validate.py \
config=configs/generation/hf/starvector-1b/im2svg.yaml \
dataset.name=starvector/svg-stack

# StarVector-8B 
python starvector/validation/validate.py \
config=configs/generation/hf/starvector-8b/im2svg.yaml \
dataset.name=starvector/svg-stack
```

### vLLM Backend

For using the vLLM backend (StarVectorVLLMAPIValidator), first install our StarVector fork of VLLM, [here](https://github.com/starvector/vllm).

```bash
git clone https://github.com/starvector/vllm.git
cd vllm
pip install -e .
```

Then, launch the using the vllm config file (it uses StarVectorVLLMValidator):

```bash
# StarVector-1B
python starvector/validation/validate.py \
config=configs/generation/vllm/starvector-1b/im2svg.yaml \
dataset.name=starvector/svg-stack

# StarVector-8B
python starvector/validation/validate.py \
config=configs/generation/vllm/starvector-8b/im2svg.yaml \
dataset.name=starvector/svg-stack
```

## 2. Creating a New SVG Validator

To create a new validator for a different model or generation backend:

1. Create a new class inheriting from `SVGValidator`
2. Implement required abstract methods:
   - `__init__(self, config)`: Initialize the validator with the given config
   - `get_dataloader(self)`: Get the dataloader for the given dataset
   - `generate_svg(self, batch)`: Generate SVG from input batch
3. Add the new validator to the registry in `starvector/validation/__init__.py`

Example:

```python
from .svg_validator_base import SVGValidator, register_validator

@register_validator
class MyNewValidator(SVGValidator):
    def __init__(self, config):
        super().__init__(config)
        # Initialize your model/client here
        
    def generate_svg(self, batch, generate_config):
        # Implement generation logic
        # Return list of generated SVG strings
        pass
        
    def get_dataloader(self):
        # Implement dataloader logic
        pass
```

## Key Features

The validation framework provides:

- Automatic metrics calculation and logging
- WandB integration for experiment tracking
- Temperature sweep for exploring generation parameters
- Comparison plot generation
- Batch processing with configurable settings

## Configuration

Validation is configured through YAML files in `configs/generation/`. Key configuration sections:

```yaml
model:
  name: "model_name"  # HF model name or path
  task: "im2svg"      # Task type
  torch_dtype: "float16"  # Model precision

dataset:
  dataset_name: "svg-stack"  # Dataset to validate on
  batch_size: 1
  num_workers: 4

generation_params:
  temperature: 0.2
  top_p: 0.9
  max_length: 1024
  # ... other generation parameters

run:
  report_to: "wandb"  # Logging backend
  out_dir: "outputs"  # Output directory
```

## Output Structure

The validator creates the following directory structure:

```
out_dir/
├── {model}_{dataset}_{timestamp}/
│   ├── config.yaml           # Run configuration
│   ├── results/
│   │   ├── results_avg.json  # Average metrics
│   │   └── all_results.csv   # Per-sample metrics
│   └── {sample_id}/         # Per-sample outputs
│       ├── metadata.json
│       ├── {sample_id}.svg
│       ├── {sample_id}_raw.svg
│       ├── {sample_id}_gt.svg
│       ├── {sample_id}_generated.png
│       ├── {sample_id}_original.png
│       └── {sample_id}_comparison.png
```