Jaiyam Sharma's picture

Jaiyam Sharma

dataplayer12
·

AI & ML interests

Computer Vision

Recent Activity

updated a model about 19 hours ago
dataplayer12/Mistral-Small-24B-Reasoning-Q4_K_M-GGUF
published a model about 19 hours ago
dataplayer12/Mistral-Small-24B-Reasoning-Q4_K_M-GGUF
updated a model 16 days ago
dataplayer12/phi-4-Q6_K
View all activity

Organizations

None yet

dataplayer12's activity

reacted to chansung's post with 👍 17 days ago
view post
Post
2004
Simple summary on DeepSeek AI's Janus-Pro: A fresh take on multimodal AI!

It builds on its predecessor, Janus, by tweaking the training methodology rather than the model architecture. The result? Improved performance in understanding and generating multimodal data.

Janus-Pro uses a three-stage training strategy, similar to Janus, but with key modifications:
✦ Stage 1 & 2: Focus on separate training for specific objectives, rather than mixing data.
✦ Stage 3: Fine-tuning with a careful balance of multimodal data.

Benchmarks show Janus-Pro holds its own against specialized models like TokenFlow XL and MetaMorph, and other multimodal models like SD3 Medium and DALL-E 3.

The main limitation? Low image resolution (384x384). However, this seems like a strategic choice to focus on establishing a solid "recipe" for multimodal models. Future work will likely leverage this recipe and increased computing power to achieve higher resolutions.
New activity in nvidia/GPT-2B-001 almost 2 years ago

Does not work with NeMo container

3
#2 opened almost 2 years ago by
dataplayer12

gibberish on 4090

1
#4 opened almost 2 years ago by
lizelive