Hugging Face

Enterprise

company

Verified

https://huggingface.co

huggingface

Activity Feed

AI & ML interests

The AI community building the future.

Recent Activity

FranckAbgrall updated a dataset about 3 hours ago

huggingface/documentation-images

pagezyhf updated a dataset about 10 hours ago

huggingface/documentation-images

julien-c new activity 1 day ago

huggingface/HuggingDiscussions:[FEEDBACK] Local apps

View all activity

Articles

Yay! Organizations can now publish blog Articles

22 days ago

• 33

huggingface's activity

FranckAbgrall

updated a dataset about 3 hours ago

huggingface/documentation-images

Viewer • Updated about 3 hours ago • 50 • 3.87M • 47

pagezyhf

updated a dataset about 10 hours ago

huggingface/documentation-images

Viewer • Updated about 3 hours ago • 50 • 3.87M • 47

lewtun

posted an update 1 day ago

Post

2177

Introducing OpenR1-Math-220k!

open-r1/OpenR1-Math-220k

The community has been busy distilling DeepSeek-R1 from inference providers, but we decided to have a go at doing it ourselves from scratch 💪

What’s new compared to existing reasoning datasets?

♾ Based on AI-MO/NuminaMath-1.5: we focus on math reasoning traces and generate answers for problems in NuminaMath 1.5, an improved version of the popular NuminaMath-CoT dataset.

🐳 800k R1 reasoning traces: We generate two answers for 400k problems using DeepSeek R1. The filtered dataset contains 220k problems with correct reasoning traces.

📀 512 H100s running locally: Instead of relying on an API, we leverage vLLM and SGLang to run generations locally on our science cluster, generating 180k reasoning traces per day.

⏳ Automated filtering: We apply Math Verify to only retain problems with at least one correct answer. We also leverage Llama3.3-70B-Instruct as a judge to retrieve more correct examples (e.g for cases with malformed answers that can’t be verified with a rules-based parser)

📊 We match the performance of DeepSeek-Distill-Qwen-7B by finetuning Qwen-7B-Math-Instruct on our dataset.

🔎 Read our blog post for all the nitty gritty details: https://huggingface.co/blog/open-r1/update-2

julien-c

in huggingface/HuggingDiscussions 1 day ago

[FEEDBACK] Local apps

#31 opened 8 months ago by

kramp

davidberenstein1957

posted an update 1 day ago

Post

1048

Fine-tune Deepseek-R1 with a Synthetic Reasoning Dataset

Blog: https://huggingface.co/blog/sdiazlor/fine-tune-deepseek-with-a-synthetic-reasoning-data

giadap

authored a paper 1 day ago

Fully Autonomous AI Agents Should Not be Developed

Paper • 2502.02649 • Published 7 days ago • 20

SaylorTwift

updated a dataset 1 day ago

huggingface/documentation-images

Viewer • Updated about 3 hours ago • 50 • 3.87M • 47

medmekk

updated a dataset 1 day ago

huggingface/documentation-images

Viewer • Updated about 3 hours ago • 50 • 3.87M • 47

medmekk

in huggingface/documentation-images 1 day ago

add_image_doc_fp8

#431 opened 1 day ago by

medmekk

lysandre

updated a dataset 1 day ago

huggingface/transformers-metadata

Viewer • Updated 1 day ago • 1.56k • 636 • 16

fdaudens

posted an update 1 day ago

Post

1033

🔥 Video AI is taking over! Out of 17 papers dropped on Hugging Face today, 6 are video-focused - from Sliding Tile Attention to On-device Sora. The race for next-gen video tech is heating up! 🎬🚀

m-ric

posted an update 4 days ago

Post

3006

𝗔𝗱𝘆𝗲𝗻'𝘀 𝗻𝗲𝘄 𝗗𝗮𝘁𝗮 𝗔𝗴𝗲𝗻𝘁𝘀 𝗕𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸 𝘀𝗵𝗼𝘄𝘀 𝘁𝗵𝗮𝘁 𝗗𝗲𝗲𝗽𝗦𝗲𝗲𝗸-𝗥𝟭 𝘀𝘁𝗿𝘂𝗴𝗴𝗹𝗲𝘀 𝗼𝗻 𝗱𝗮𝘁𝗮 𝘀𝗰𝗶𝗲𝗻𝗰𝗲 𝘁𝗮𝘀𝗸𝘀! ❌

➡️ How well do reasoning models perform on agentic tasks? Until now, all indicators seemed to show that they worked really well. On our recent reproduction of Deep Search, OpenAI's o1 was by far the best model to power an agentic system.

So when our partner Adyen built a huge benchmark of 450 data science tasks, and built data agents with smolagents to test different models, I expected reasoning models like o1 or DeepSeek-R1 to destroy the tasks at hand.

👎 But they really missed the mark. DeepSeek-R1 only got 1 or 2 out of 10 questions correct. Similarly, o1 was only at ~13% correct answers.

🧐 These results really surprised us. We thoroughly checked them, we even thought our APIs for DeepSeek were broken and colleagues Leandro Anton helped me start custom instances of R1 on our own H100s to make sure it worked well.
But there seemed to be no mistake. Reasoning LLMs actually did not seem that smart. Often, these models made basic mistakes, like forgetting the content of a folder that they had just explored, misspelling file names, or hallucinating data. Even though they do great at exploring webpages through several steps, the same level of multi-step planning seemed much harder to achieve when reasoning over files and data.

It seems like there's still lots of work to do in the Agents x Data space. Congrats to Adyen for this great benchmark, looking forward to see people proposing better agents! 🚀

Read more in the blog post 👉 https://huggingface.co/blog/dabstep

fdaudens

posted an update 5 days ago

Post

1974

📢 SmolLM2 paper released! Learn how the 🤗 team built one of the best small language models: from data choices to training insights. Check out our findings and share your thoughts! 🤏💡

Check it out: SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model (2502.02737)