NYU VisionX

university

https://www.sainingxie.com/

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

Riiiickkk updated a dataset 1 day ago

nyu-visionx/pisa-experiments

sayakpaul authored a paper 3 days ago

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

Riiiickkk published a dataset 5 days ago

nyu-visionx/pisa-experiments

View all activity

nyu-visionx's activity

Riiiickkk

updated a dataset 1 day ago

nyu-visionx/pisa-experiments

Updated 1 day ago • 64 • 1

sayakpaul

authored a paper 3 days ago

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

Paper • 2503.09641 • Published 5 days ago • 16

Riiiickkk

published a dataset 5 days ago

nyu-visionx/pisa-experiments

Updated 1 day ago • 64 • 1

xcpan

updated a model 5 days ago

nyu-visionx/oro_depth_ckpt

Updated 5 days ago

xcpan

published a model 11 days ago

nyu-visionx/oro_depth_ckpt

Updated 5 days ago

jihanyang

authored a paper 17 days ago

UniTok: A Unified Tokenizer for Visual Generation and Understanding

Paper • 2502.20321 • Published 17 days ago • 29

xcpan

updated a dataset 22 days ago

nyu-visionx/oro_depth_reward

Viewer • Updated 22 days ago • 889k • 154

sayakpaul

posted an update 27 days ago

Post

3136

Inference-time scaling meets Flux.1-Dev (and others) 🔥

Presenting a simple re-implementation of "Inference-time scaling diffusion models beyond denoising steps" by Ma et al.

I did the simplest random search strategy, but results can potentially be improved with better-guided search methods.

Supports Gemini 2 Flash & Qwen2.5 as verifiers for "LLMGrading" 🤗

The steps are simple:

For each round:

1> Starting by sampling 2 starting noises with different seeds.
2> Score the generations w.r.t a metric.
3> Obtain the best generation from the current round.

If you have more compute budget, go to the next search round. Scale the noise pool (2 ** search_round) and repeat 1 - 3.

This constitutes the random search method as done in the paper by Google DeepMind.

Code, more results, and a bunch of other stuff are in the repository. Check it out here: https://github.com/sayakpaul/tt-scale-flux/ 🤗

xcpan

published a dataset 29 days ago

nyu-visionx/oro_depth_reward

Viewer • Updated 22 days ago • 889k • 154

sayakpaul

posted an update about 2 months ago

Post

2007

We have been cooking a couple of fine-tuning runs on CogVideoX with finetrainers, smol datasets, and LoRA to generate cool video effects like crushing, dissolving, etc.

We are also releasing a LoRA extraction utility from a fully fine-tuned checkpoint. I know that kind of stuff has existed since eternity, but the quality on video models was nothing short of spectacular. Below are some links:

* Models and datasets: https://huggingface.co/finetrainers
* finetrainers: https://github.com/a-r-r-o-w/finetrainers
* LoRA extraction: https://github.com/huggingface/diffusers/blob/main/scripts/extract_lora_from_model.py

1 reply

sainx

authored a paper about 2 months ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 109

jihanyang

authored a paper about 2 months ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 109

sayakpaul

posted an update about 2 months ago

Post

1972

We have authored a post to go over the state of video generation in the Diffusers ecosystem 🧨

We cover the models supported, the knobs of optims our users can fire, fine-tuning, and more 🔥

5-6GBs for HunyuanVideo, sky is the limit 🌌 🤗
https://huggingface.co/blog/video_gen

craigwu

authored a paper about 2 months ago

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Paper • 2501.13826 • Published Jan 23 • 24

sainx

authored a paper about 2 months ago

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Paper • 2501.09732 • Published Jan 16 • 70

jihanyang

updated a dataset 2 months ago

nyu-visionx/VSI-Bench

Viewer • Updated Jan 14 • 5.13k • 4.49k • 31

sayakpaul

posted an update 3 months ago

Post

4378

Commits speak louder than words 🤪

* 4 new video models
* Multiple image models, including SANA & Flux Control
* New quantizers -> GGUF & TorchAO
* New training scripts

Enjoy this holiday-special Diffusers release 🤗
Notes: https://github.com/huggingface/diffusers/releases/tag/v0.32.0

anjaliwgupta

authored a paper 3 months ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 24

jihanyang

authored a paper 3 months ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 24

rilynhan

authored a paper 3 months ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 24

AI & ML interests

Recent Activity

Team members 15

nyu-visionx's activity