File size: 4,103 Bytes
72f684c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
# StarVector Validation

This module provides validation functionality for StarVector models, allowing evaluation of SVG generation quality across different model architectures and generation backends.

## Overview

The validation framework consists of:

1. A base `SVGValidator` class that handles common validation logic
2. Specific validator implementations for different backends:
   - `StarVectorHFSVGValidator`: Uses HuggingFace generation API
   - `StarVectorVLLMValidator`: Uses vLLM for faster generation
   - `StarVectorVLLMAPIValidator`: Uses vLLM through REST API

## 1. Running Validation

### Using HuggingFace Backend

```bash
# StarVector-1B
python starvector/validation/validate.py \
config=configs/generation/hf/starvector-1b/im2svg.yaml \
dataset.name=starvector/svg-stack

# StarVector-8B 
python starvector/validation/validate.py \
config=configs/generation/hf/starvector-8b/im2svg.yaml \
dataset.name=starvector/svg-stack
```

### vLLM Backend

For using the vLLM backend (StarVectorVLLMAPIValidator), first install our StarVector fork of VLLM, [here](https://github.com/starvector/vllm).

```bash
git clone https://github.com/starvector/vllm.git
cd vllm
pip install -e .
```

Then, launch the using the vllm config file (it uses StarVectorVLLMValidator):

```bash
# StarVector-1B
python starvector/validation/validate.py \
config=configs/generation/vllm/starvector-1b/im2svg.yaml \
dataset.name=starvector/svg-stack

# StarVector-8B
python starvector/validation/validate.py \
config=configs/generation/vllm/starvector-8b/im2svg.yaml \
dataset.name=starvector/svg-stack
```

## 2. Creating a New SVG Validator

To create a new validator for a different model or generation backend:

1. Create a new class inheriting from `SVGValidator`
2. Implement required abstract methods:
   - `__init__(self, config)`: Initialize the validator with the given config
   - `get_dataloader(self)`: Get the dataloader for the given dataset
   - `generate_svg(self, batch)`: Generate SVG from input batch
3. Add the new validator to the registry in `starvector/validation/__init__.py`

Example:

```python
from .svg_validator_base import SVGValidator, register_validator

@register_validator
class MyNewValidator(SVGValidator):
    def __init__(self, config):
        super().__init__(config)
        # Initialize your model/client here
        
    def generate_svg(self, batch, generate_config):
        # Implement generation logic
        # Return list of generated SVG strings
        pass
        
    def get_dataloader(self):
        # Implement dataloader logic
        pass
```

## Key Features

The validation framework provides:

- Automatic metrics calculation and logging
- WandB integration for experiment tracking
- Temperature sweep for exploring generation parameters
- Comparison plot generation
- Batch processing with configurable settings

## Configuration

Validation is configured through YAML files in `configs/generation/`. Key configuration sections:

```yaml
model:
  name: "model_name"  # HF model name or path
  task: "im2svg"      # Task type
  torch_dtype: "float16"  # Model precision

dataset:
  dataset_name: "svg-stack"  # Dataset to validate on
  batch_size: 1
  num_workers: 4

generation_params:
  temperature: 0.2
  top_p: 0.9
  max_length: 1024
  # ... other generation parameters

run:
  report_to: "wandb"  # Logging backend
  out_dir: "outputs"  # Output directory
```

## Output Structure

The validator creates the following directory structure:

```
out_dir/
β”œβ”€β”€ {model}_{dataset}_{timestamp}/
β”‚   β”œβ”€β”€ config.yaml           # Run configuration
β”‚   β”œβ”€β”€ results/
β”‚   β”‚   β”œβ”€β”€ results_avg.json  # Average metrics
β”‚   β”‚   └── all_results.csv   # Per-sample metrics
β”‚   └── {sample_id}/         # Per-sample outputs
β”‚       β”œβ”€β”€ metadata.json
β”‚       β”œβ”€β”€ {sample_id}.svg
β”‚       β”œβ”€β”€ {sample_id}_raw.svg
β”‚       β”œβ”€β”€ {sample_id}_gt.svg
β”‚       β”œβ”€β”€ {sample_id}_generated.png
β”‚       β”œβ”€β”€ {sample_id}_original.png
β”‚       └── {sample_id}_comparison.png
```