Accurate & efficient vision models, ops and systems
AI & ML interests
Computer Vision, AI, Machine Learning
Recent Activity
View all activity
Papers
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation
spaces
11
pinned
Running
5
Physical AI Bench Leaderboard
π€
Benchmark for Physical AI generation and understanding
Running
on
Zero
5
VisPer-LM
π
Visualize image depth, segmentation, and generation
Runtime error
Slow Fast Video Mllm
π
Describe video content with text prompts
Runtime error
Featured
408
Versatile Diffusion
π
Runtime error
Featured
63
VCoder
β
Build error
10
Smooth Diffusion
π
models
57
shi-labs/IMG
Updated
β’
1
shi-labs/slowfast-video-mllm-qwen2-7b-convnext-576-frame64-s1t4
Video-Text-to-Text
β’
9B
β’
Updated
β’
46
shi-labs/slowfast-video-mllm-qwen2-7b-convnext-576-frame96-s1t6
Video-Text-to-Text
β’
9B
β’
Updated
β’
8
shi-labs/slowfast-video-mllm-qwen2-7b-convnext-576-frame128-s2t4
9B
β’
Updated
β’
5
shi-labs/probe_depth_ola-vlm-pt-ift
Image-Text-to-Text
β’
10B
β’
Updated
β’
4
shi-labs/probe_gen_ola-vlm-pt-ift
Image-Text-to-Text
β’
9B
β’
Updated
β’
8
shi-labs/probe_gen_llava-1.5-pt-vpt-ift
Image-Text-to-Text
β’
9B
β’
Updated
β’
8
shi-labs/probe_gen_llava-1.5-pt
Image-Text-to-Text
β’
9B
β’
Updated
β’
5
shi-labs/probe_gen_llava-1.5-pt-0.5ift
Image-Text-to-Text
β’
9B
β’
Updated
β’
10
shi-labs/probe_gen_llava-1.5-pt-ift
Image-Text-to-Text
β’
9B
β’
Updated
β’
8
datasets
8
shi-labs/physical-ai-bench-generation
Viewer
β’
Updated
β’
1.04k
β’
1.01k
β’
3
shi-labs/physical-ai-bench-conditional-generation
Viewer
β’
Updated
β’
600
β’
2.32k
shi-labs/physical-ai-bench-understanding
Viewer
β’
Updated
β’
1.21k
β’
826
shi-labs/Eagle-1.8M
Updated
β’
112
β’
7
shi-labs/Agriculture-Vision
Preview
β’
Updated
β’
38
β’
3
shi-labs/CuMo_dataset
Preview
β’
Updated
β’
28
β’
6
shi-labs/COST
Updated
β’
226
β’
4
shi-labs/oneformer_demo
Preview
β’
Updated
β’
97.9k