Daniel Schmidt

danschmidt88

AI & ML interests

None yet

Recent Activity

upvoted a paper 5 days ago

JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse

reacted to albertvillanova's post with 🔥 9 days ago

🚀 Big news for AI agents! With the latest release of smolagents, you can now securely execute Python code in sandboxed Docker or E2B environments. 🦾🔒 Here's why this is a game-changer for agent-based systems: 🧵👇 1️⃣ Security First 🔐 Running AI agents in unrestricted Python environments is risky! With sandboxing, your agents are isolated, preventing unintended file access, network abuse, or system modifications. 2️⃣ Deterministic & Reproducible Runs 📦 By running agents in containerized environments, you ensure that every execution happens in a controlled and predictable setting—no more environment mismatches or dependency issues! 3️⃣ Resource Control & Limits 🚦 Docker and E2B allow you to enforce CPU, memory, and execution time limits, so rogue or inefficient agents don’t spiral out of control. 4️⃣ Safer Code Execution in Production 🏭 Deploy AI agents confidently, knowing that any generated code runs in an ephemeral, isolated environment, protecting your host machine and infrastructure. 5️⃣ Easy to Integrate 🛠️ With smolagents, you can simply configure your agent to use Docker or E2B as its execution backend—no need for complex security setups! 6️⃣ Perfect for Autonomous AI Agents 🤖 If your AI agents generate and execute code dynamically, this is a must-have to avoid security pitfalls while enabling advanced automation. ⚡ Get started now: https://github.com/huggingface/smolagents What will you build with smolagents? Let us know! 🚀💡

liked a Space 9 days ago

smolagents/smolagents-leaderboard

View all activity

Organizations

None yet

danschmidt88's activity

upvoted a paper 5 days ago

JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse

Paper • 2503.16365 • Published 8 days ago • 34

reacted to albertvillanova's post with 🔥 9 days ago

Post

3852

🚀 Big news for AI agents! With the latest release of smolagents, you can now securely execute Python code in sandboxed Docker or E2B environments. 🦾🔒

Here's why this is a game-changer for agent-based systems: 🧵👇

1️⃣ Security First 🔐
Running AI agents in unrestricted Python environments is risky! With sandboxing, your agents are isolated, preventing unintended file access, network abuse, or system modifications.

2️⃣ Deterministic & Reproducible Runs 📦
By running agents in containerized environments, you ensure that every execution happens in a controlled and predictable setting—no more environment mismatches or dependency issues!

3️⃣ Resource Control & Limits 🚦
Docker and E2B allow you to enforce CPU, memory, and execution time limits, so rogue or inefficient agents don’t spiral out of control.

4️⃣ Safer Code Execution in Production 🏭
Deploy AI agents confidently, knowing that any generated code runs in an ephemeral, isolated environment, protecting your host machine and infrastructure.

5️⃣ Easy to Integrate 🛠️
With smolagents, you can simply configure your agent to use Docker or E2B as its execution backend—no need for complex security setups!

6️⃣ Perfect for Autonomous AI Agents 🤖
If your AI agents generate and execute code dynamically, this is a must-have to avoid security pitfalls while enabling advanced automation.

⚡ Get started now: https://github.com/huggingface/smolagents

What will you build with smolagents? Let us know! 🚀💡

liked a Space 9 days ago

114

smolagents LLM leaderboard

🏆

A leaderboard for LLMs powering smolagents

reacted to clefourrier's post with 🚀 9 days ago

Post

1913

Gemma3 family is out! Reading the tech report, and this section was really interesting to me from a methods/scientific fairness pov.

Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**.
(Which everybody does, but people usually don't say)

For a tech report, it makes a lot of sense to report model performance when used optimally!
On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)

Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!

Because if your model knows its evals by heart, you're not testing for generalization.