3 3 9

Shoaib Hossain

KillerShoaib

AI & ML interests

Computer Vision: Obj Detection, Segmentation, Generative AI NLP: LLMs

Recent Activity

reacted to tianchez's post with 🚀 26 days ago

Introducing VLM-R1! GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks? The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task). https://github.com/om-ai-lab/VLM-R1

reacted to ezgikorkmaz's post with 👍 about 1 month ago

If you are interested in deep reinforcement learning now you can register for the AAAI 2025 Tutorial I am organizing! Link: https://sites.google.com/view/aisafety-aaai2025

upvoted a collection about 1 month ago

Bangla Datasets for LLMs Finetuning

View all activity

Organizations

None yet

KillerShoaib's activity

reacted to tianchez's post with 🚀 26 days ago

Post

4123

Introducing VLM-R1!

GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?

The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).

https://github.com/om-ai-lab/VLM-R1

3 replies

reacted to ezgikorkmaz's post with 👍 about 1 month ago

Post

2123

If you are interested in deep reinforcement learning now you can register for the AAAI 2025 Tutorial I am organizing!

Link: https://sites.google.com/view/aisafety-aaai2025

upvoted a collection about 1 month ago

Bangla Datasets for LLMs Finetuning

Collection

This collection contains all Bengali datasets which are more effective and useful for LLM fine-tuning or instruction-tuning or other various NLP tasks • 39 items • Updated Aug 31, 2024 • 7

updated a dataset about 2 months ago

KillerShoaib/RakibulAI-Utub-Bangla-Transcription

Viewer • Updated Jan 24 • 302 • 52 • 1

published a dataset about 2 months ago

KillerShoaib/RakibulAI-Utub-Bangla-Transcription

Viewer • Updated Jan 24 • 302 • 52 • 1

liked a model 3 months ago

ai4bharat/indic-parler-tts

Text-to-Speech • Updated Dec 9, 2024 • 30.5k • 113

reacted to nyuuzyou's post with 👍 4 months ago

Post

2332

🎵 Introducing Suno Music Generation Dataset - nyuuzyou/suno

Dataset highlights:

- 659,788 AI-generated music samples with comprehensive metadata from suno.com
- Multilingual content with English as primary language, including Japanese and other languages
- Each entry contains rich metadata including:
- Unique song ID, audio/video URLs, and thumbnail images
- AI model version and generation parameters
- Song metadata (tags, prompts, duration)
- Creator information and engagement metrics
- Released to the public domain under Creative Commons Zero (CC0) license

The dataset structure includes detailed information about each generated piece, from technical parameters to user engagement metrics, making it particularly valuable for:
- Music generation model training
- Cross-modal analysis (text-to-audio relationships)
- User engagement studies
- Audio classification tasks
- Music style and genre analysis

liked a model 4 months ago

turjo4nis/colbertv2.0-bn

Updated Nov 21, 2024 • 13 • 3

reacted to MonsterMMORPG's post with 🔥 5 months ago

Post

3712

Stability AI published their most power newest model Stable Diffusion 3.5 Large. This model unlike FLUX is full model not distilled and has huge potential. I have done extensive research and publishing all of it in this video regarding how to use SD 3.5 Large with the best settings. Moreover, I am sharing how to use FLUX DEV with the best possible configuration as well. Moreover, I am making a huge comparison between SD 3.5 and FLUX and you are going to learn who is the winner.

https://youtu.be/-zOKhoO9a5s

62 Prompts tested on all experiments to find best Sampler + Scheduler for Stable Diffusion 3.5 Large and SD 3.5 Large vs FLUX DEV > https://youtu.be/-zOKhoO9a5s

FLUX Dev vs SD 3.5 Large fully compared.

SD 3.5 Large FP16 vs Scaled FP8 fully compared.

T5 XXL FP8 vs Scaled FP8 vs FP16 fully compared.

FLUX FP16 vs Scaled FP8 fully compared.

Also how to install SwarmUI on Windows, Massed Compute and RunPod shown in the tutorial.

I have shown how to use FLUX and SD 3.5 Large in details as well.

liked 2 datasets 5 months ago

vikhyatk/lofi

Viewer • Updated Oct 26, 2024 • 857k • 6.89k • 80

neulab/PangeaInstruct

Updated Feb 2 • 555 • 82

reacted to kz919's post with 🚀 6 months ago

Post

1519

Just for the meme.

But the clear lesson I learnt from building these demos are, the more powerful the underlying base model is, the closer you will get to GPT4o1. CoT is nothing more than simply inducing the latent reasoning capability from the model.

kz919/GPT4-O1-Proximas