Unlike the impressive DeepSeek-R1(-Zero), this project focuses on a pure reinforcement learning (RL) experiment applied to an open-domain task: creative advertisement generation.
Objective:
- To investigate the feasibility of applying R1-like methods to an open-domain task without a verifiable ground-truth reward, while at least demonstrating its potential. - To explore whether <think> and <answer> rewards can be explicitly designed to provide strong guidance through RL based on human prior knowledge.
Note: - Our goal is not to induce self-reflective thinking, but to align with human thought processes purely through RL, without any supervised fine-tuning (SFT) on any constructed dataset.
Despite its small size, the resulting 1.5B-GRPO model demonstrates intriguing generative capabilitiesโthough it's still far from perfect.
reacted to AdinaY's
post with ๐ฅabout 12 hours ago
Ovis2 ๐ฅ a multimodal LLM released by Alibaba AIDC team. AIDC-AI/ovis2-67ab36c7e497429034874464 โจ1B/2B/4B/8B/16B/34B โจStrong CoT for deeper problem solving โจMultilingual OCR โ Expanded beyond English & Chinese, with better data extraction
reacted to onekq's
post with ๐about 12 hours ago
Players include Huggingface (Open R1), Stanford (simple scaling), Berkeley (Bespoke, Open thoughts, etc.), ServiceNow, etc. I know there is another work from HKUST but couldn't find it on ๐ค. Let me know if I miss any teams.
4 replies
ยท
reacted to Keltezaa's
post with ๐about 19 hours ago
Why does all the Text-to-image models running on HF Inference API fail and report fail with the error "Model strangerzonehf/Neon-Impressionism-Flux does not exist"
It used to work last month.
1 reply
ยท
reacted to fdaudens's
post with โค๏ธabout 19 hours ago
โญ๏ธ The AI Energy Score project just launched - this is a game-changer for making informed decisions about AI deployment.
You can now see exactly how much energy your chosen model will consume, with a simple 5-star rating system. Think appliance energy labels, but for AI.
Looking at transcription models on the leaderboard is fascinating: choosing between whisper-tiny or whisper-large-v3 can make a 7x difference. Real-time data on these tradeoffs changes everything.
166 models already evaluated across 10 different tasks, from text generation to image classification. The whole thing is public and you can submit your own models to test.
Why this matters: - Teams can pick efficient models that still get the job done - Developers can optimize for energy use from day one - Organizations can finally predict their AI environmental impact
If you're building with AI at any scale, definitely worth checking out.
Specifically, the duplication of layers in Frankenmerges serves a purpose similar to what occurs in their recurrent-depth architecture. Successful frankenmerges that operate without additional fine-tuning are able to recover or "heal" from any damage due to abrupt transitions between layer blocks. Operational replicated layer blocks can provide functional benefits grounded in latent reasoning. Frankenmerges can also result in hybrid reasoning, by splicing together the latent reasoning of different models.
Back in April 2024, I was able to duplicate a few layers in the Llama 3 8B model, turning it into a 9B model, without harming benchmarks significantly, despite any transition damage. grimjim/llama-3-experiment-v1-9B My informal experimentation suggested that latent reasoning circuits could occupy continguous stacks of 2-4 layers, though the result was highly sensitive to the choice of transition location between layers.
โก Can Stable Diffusion's visual expertise enhance Llama-3.2? ๐ Lavender: efficiently fine-tunes advanced vision-language models by aligning their text-vision attention with Stable Diffusion. Paper: Diffusion Instruction Tuning (2502.06814) ๐ Key Highlights: โ Significant Gains: +30% on 20 tasks, +68% on OOD WorldMedQA โ Data-Efficient: Needs only 0.13M samples (~2.5% of typical VLM datasets) โ Low Compute: Finetunes in ~1 day on 8 NVIDIA A10G GPUs โ Model-Agnostic: Works with Llama-3.2-11B, MiniCPM-Llama3-v2.5 & more โ Precise Alignment: Transfers strong text-vision alignment from Stable Diffusion โ Open-Source: Code, data & finetuned models will be available
Toward the end of last year, the Xet team provided an inside look into the foundations of how we plan to enable rapid experimentation and iteration for the AI builders on the Hub: https://huggingface.co/blog/from-files-to-chunks
But it turns out chunks aren't all you need!
Our goal is to bring: ๐ Faster uploads โฌ Speedy downloads ๐ช All without sacrificing your workflow
To do that, we need the infrastructure and system and design to back it up. As we prepare to roll out the first Xet-backed repositories on the Hub, we wrote up a post explaining the nitty gritty details of the decisions that bring this to life https://huggingface.co/blog/from-chunks-to-blocks
Complete with an interactive visualization that shows the power of deduplication in action - taking a 191GB repo to ~97GB and shaving a few hours off upload speeds.
The darker each block in the heatmap, the more we dedupe, the less we have to transfer. Clicking on a file's blocks shows all other files that share blocks.
A collection of 1.38M educational texts featuring: - 1.33M educational presentations with full slide content - 47K academic documents with complete text - Multilingual content (Russian, Ukrainian, English) - Full metadata including titles and descriptions
All content is available under CC0 license, allowing unrestricted use including commercial applications.
I am excited to share that Iโve successfully completed Unit 1: Foundations of Agents in the Hugging Face Agents Course. Exploring the fundamentals of AI agents has been an insightful journey, and Iโm looking forward to applying these concepts in real-world applications. Big thanks to the Hugging Face team for this amazing learning opportunity! ๐ค Check out the course here: https://huggingface.co/learn/agents-course/
It now includes : - a live stream of the progress being made on the task (see included video), - The following components: 1. Automatic prompt optimization 2. An orchestrator deciding which agent to call dynamically including feedback from a human (human-in-the-loop) 3. A coding agent to complete the task 4. A code reviewing agent to iteratively provide feedback to improve the code generated by the coding agent until the code meets the required criteria after which it is approved. 5. A testing agent that tests the approved code or provides information on how to test it. 6. A documentation agent that provides documentation and a help message for the approved and tested code.
"๐ฎ๐ฌ๐ฎ๐ฑ ๐๐ถ๐น๐น ๐ฏ๐ฒ ๐๐ต๐ฒ ๐๐ฒ๐ฎ๐ฟ ๐ผ๐ณ ๐๐ ๐ฎ๐ด๐ฒ๐ป๐๐": this statement has often been made, here are numbers to support it.
I've plotted the progress of AI agents on GAIA test set, and it seems they're headed to catch up with the human baseline in early 2026.
And that progress is still driven mostly by the improvement of base LLMs: progress would be even faster with fine-tuned agentic models.