Sam Heutmaker commited on
Commit
6c676f7
·
1 Parent(s): 43a643a

update readme

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -39,7 +39,9 @@ model-index:
39
 
40
  ## Model Description
41
 
42
- **ClipTagger-12b** is a 12-billion parameter vision-language model (VLM) designed for video understanding at massive scale. Developed by [Inference.net](https://inference.net) in collaboration with [Grass](https://grass.io), this model was created to meet the demanding requirements of trillion-scale video frame captioning workloads.
 
 
43
 
44
  The model generates structured, schema-consistent JSON outputs for every video frame, making it ideal for building searchable video databases, content moderation systems, and accessibility tools. It maintains temporal consistency across frames while delivering frontier-quality performance at a fraction of the cost of closed-source alternatives.
45
 
 
39
 
40
  ## Model Description
41
 
42
+ **ClipTagger-12b** is a 12-billion parameter vision-language model (VLM) designed for video understanding at massive scale. Developed by [Inference.net](https://inference.net) in collaboration with [Grass](https://grass.io), this model was created to meet the demanding requirements of trillion-scale video frame captioning workloads
43
+
44
+ **ClipTagger-12b exceeds or matches the performance of GPT-4.1 and Claude 4 Sonnet, while costing 15x less per generation.**
45
 
46
  The model generates structured, schema-consistent JSON outputs for every video frame, making it ideal for building searchable video databases, content moderation systems, and accessibility tools. It maintains temporal consistency across frames while delivering frontier-quality performance at a fraction of the cost of closed-source alternatives.
47