2 9 10

Nagori

MohammedNaeem

Naeem_1144

AI & ML interests

None yet

Recent Activity

liked a model 3 days ago

amd/Instella-3B-Instruct

liked a Space 8 days ago

Qwen/QwQ-32B-Demo

reacted to MohamedRashad's post with ❤️ 10 days ago

I think we have released the best Arabic model under 25B at least based on https://huggingface.co/spaces/inceptionai/AraGen-Leaderboard Yehia = https://huggingface.co/ALLaM-AI/ALLaM-7B-Instruct-preview + GRPO and its ranked number one model under the 25B parameter size mark. Now, i said "i think" not "i am sure" because this model used the same metric of evaluation the AraGen developers use (the 3C3H) as a reward model to improve its responses and this sparks the question. Is this something good for users or is it another type of overfitting that we don't want ? I don't know if this is a good thing or a bad thing but what i know is that you can try it from here: https://huggingface.co/spaces/Navid-AI/Yehia-7B-preview or Download it for your personal experiments from here: https://huggingface.co/Navid-AI/Yehia-7B-preview Ramadan Kareem 🌙

View all activity

Organizations

None yet

MohammedNaeem's activity

liked a model 3 days ago

amd/Instella-3B-Instruct

Text Generation • Updated 7 days ago • 1.18k • 33

liked a Space 8 days ago

369

QwQ 32B Demo

🌖

Generate text responses to user prompts

reacted to MohamedRashad's post with ❤️ 10 days ago

Post

3411

I think we have released the best Arabic model under 25B at least based on inceptionai/AraGen-Leaderboard

Yehia = ALLaM-AI/ALLaM-7B-Instruct-preview + GRPO

and its ranked number one model under the 25B parameter size mark.

Now, i said "i think" not "i am sure" because this model used the same metric of evaluation the AraGen developers use (the 3C3H) as a reward model to improve its responses and this sparks the question. Is this something good for users or is it another type of overfitting that we don't want ?

I don't know if this is a good thing or a bad thing but what i know is that you can try it from here:
Navid-AI/Yehia-7B-preview

or Download it for your personal experiments from here:
Navid-AI/Yehia-7B-preview

Ramadan Kareem 🌙

1 reply

upvoted a collection 25 days ago

NeMo Curator - Classifier Models

Collection

Classifier models that can be used in NeMo Curator for labelling/filtering datasets. • 11 items • Updated 27 days ago • 16

liked a model 25 days ago

zed-industries/zeta

Updated 14 days ago • 2.4k • 229

New activity in huggingchat/chat-ui 28 days ago

[MODELS] Discussion

639

#372 opened about 1 year ago by

victor

reacted to lewtun's post with ❤️ about 1 month ago

Post

4867

Introducing OpenR1-Math-220k!

open-r1/OpenR1-Math-220k

The community has been busy distilling DeepSeek-R1 from inference providers, but we decided to have a go at doing it ourselves from scratch 💪

What’s new compared to existing reasoning datasets?

♾ Based on AI-MO/NuminaMath-1.5: we focus on math reasoning traces and generate answers for problems in NuminaMath 1.5, an improved version of the popular NuminaMath-CoT dataset.

🐳 800k R1 reasoning traces: We generate two answers for 400k problems using DeepSeek R1. The filtered dataset contains 220k problems with correct reasoning traces.

📀 512 H100s running locally: Instead of relying on an API, we leverage vLLM and SGLang to run generations locally on our science cluster, generating 180k reasoning traces per day.

⏳ Automated filtering: We apply Math Verify to only retain problems with at least one correct answer. We also leverage Llama3.3-70B-Instruct as a judge to retrieve more correct examples (e.g for cases with malformed answers that can’t be verified with a rules-based parser)

📊 We match the performance of DeepSeek-Distill-Qwen-7B by finetuning Qwen-7B-Math-Instruct on our dataset.

🔎 Read our blog post for all the nitty gritty details: https://huggingface.co/blog/open-r1/update-2

upvoted an article about 1 month ago

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

• 1.16k

reacted to chansung's post with ❤️ about 1 month ago

Post

2022

Simple summary on DeepSeek AI's Janus-Pro: A fresh take on multimodal AI!

It builds on its predecessor, Janus, by tweaking the training methodology rather than the model architecture. The result? Improved performance in understanding and generating multimodal data.

Janus-Pro uses a three-stage training strategy, similar to Janus, but with key modifications:
✦ Stage 1 & 2: Focus on separate training for specific objectives, rather than mixing data.
✦ Stage 3: Fine-tuning with a careful balance of multimodal data.

Benchmarks show Janus-Pro holds its own against specialized models like TokenFlow XL and MetaMorph, and other multimodal models like SD3 Medium and DALL-E 3.

The main limitation? Low image resolution (384x384). However, this seems like a strategic choice to focus on establishing a solid "recipe" for multimodal models. Future work will likely leverage this recipe and increased computing power to achieve higher resolutions.