open/ acc

community

AI & ML interests

None defined yet.

Recent Activity

open-acc's activity

burtenshaw 
posted an update about 17 hours ago
view post
Post
2002
The Hugging Face agents course is finally out!

👉 https://huggingface.co/agents-course

This first unit of the course sets you up with all the fundamentals to become a pro in agents.

- What's an AI Agent?
- What are LLMs?
- Messages and Special Tokens
- Understanding AI Agents through the Thought-Action-Observation Cycle
- Thought, Internal Reasoning and the Re-Act Approach
- Actions, Enabling the Agent to Engage with Its Environment
- Observe, Integrating Feedback to Reflect and Adapt
mmhamdy 
posted an update about 19 hours ago
view post
Post
1475
⛓ Evaluating Long Context #2: SCROLLS and ZeroSCROLLS

In this series of posts about tracing the history of long context evaluation, we started with Long Range Arena (LRA). Introduced in 2020, Long Range Arens (LRA) is one of the earliest benchmarks designed to tackle the challenge of long context evaluation. But it wasn't introduced to evaluate LLMs, but rather the transformer architecture in general.

📜 The SCROLLS benchmark, introduced in 2022, addresses this gap in NLP/LLM research. SCROLLS challenges models with tasks that require reasoning over extended sequences (according to 2022 standards). So, what does it offer?

1️⃣ Long Text Focus: SCROLLS (unlike LRA) focus mainly on text and contain inputs with thousands of words, testing models' ability to synthesize information across lengthy documents.
2️⃣ Diverse Tasks: Includes summarization, question answering, and natural language inference across domains like literature, science, and business.
3️⃣ Unified Format: All datasets are available in a text-to-text format, facilitating easy evaluation and comparison of models.

Building on SCROLLS, ZeroSCROLLS takes long text evaluation to the next level by focusing on zero-shot learning. Other features include:

1️⃣ New Tasks: Introduces tasks like sentiment aggregation and sorting book chapter summaries.
2️⃣ Leaderboard: A live leaderboard encourages continuous improvement and competition among researchers.

💡 What are some other landmark benchmarks in the history of long context evaluation? Feel free to share your thoughts and suggestions in the comments.

- SCROLLS Paper: SCROLLS: Standardized CompaRison Over Long Language Sequences (2201.03533)
- ZeroSCROLLS Paper: ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding (2305.14196)
merve 
posted an update 4 days ago
view post
Post
2473
Interesting releases in open AI this week, let's recap 🤠 merve/feb-7-releases-67a5f7d7f172d8bfe0dd66f4

🤖 Robotics
> Pi0, first open-source foundation vision-language action model was released in Le Robot (Apache 2.0)

💬 LLMs
> Groundbreaking: s1 is simpler approach to test-time scaling, the release comes with small s1K dataset of 1k question-reasoning trace pairs (from Gemini-Thinking Exp) they fine-tune Qwen2.5-32B-Instruct to get s1-32B, outperforming o1-preview on math 🤯 s1-32B and s1K is out!
> Adyen released DABstep, a new benchmark along with it's leaderboard demo for agents doing data analysis
> Krutrim released Krutrim-2 instruct, new 12B model based on NeMo12B trained and aligned on Indic languages, a new multilingual sentence embedding model (based on STSB-XLM-R), and a translation model for Indic languages

👀 Multimodal
> PKU released Align-DS-V, a model aligned using their new technique called LLF for all modalities (image-text-audio), along with the dataset Align Anything
> OLA-7B is a new any-to-any model by Tencent that can take text, image, video, audio data with context window of 32k tokens and output text and speech in English and Chinese
> Krutrim released Chitrarth, a new vision language model for Indic languages and English

🖼️ Vision
> BiRefNet_HR is a new higher resolution BiRefNet for background removal

🗣️ Audio
> kyutai released Hibiki, it's a real-time speech-to-speech translation model 🤯 it's available for French-English translation
> Krutrim released Dhwani, a new STT model for Indic languages
> They also release a new dataset for STT-TTS

🖼️ Image Generation
> Lumina released Lumina-Image-2.0, a 2B parameter-flow based DiT for text to image generation
> Tencent released Hunyuan3D-2, a 3D asset generation model based on DiT and Hunyuan3D-Paint
> boreal-hl-v1 is a new boring photorealistic image generation LoRA based on Hunyuan
burtenshaw 
posted an update 5 days ago
view post
Post
3057
SmolLM2 paper is out! 😊

😍 Why do I love it? Because it facilitates teaching and learning!

Over the past few months I've engaged with (no joke) thousands of students based on SmolLM.

- People have inferred, fine-tuned, aligned, and evaluated this smol model.
- People used they're own machines and they've used free tools like colab, kaggle, and spaces.
- People tackled use cases in their job, for fun, in their own language, and with their friends.

upvote the paper SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model (2502.02737)
  • 1 reply
·
merve 
posted an update 5 days ago
Tonic 
posted an update 8 days ago
view post
Post
1979
🙋🏻‍♂️hey there folks ,

Goedel's Theorem Prover is now being demo'ed on huggingface : Tonic/Math

give it a try !
csabakecskemeti 
posted an update 9 days ago
view post
Post
1795
Check out my idea:
LLmaaS - Local LLM as a Service

With LLmaaS, I propose leveraging locally running LLMs as a service, providing a standardized way for websites to access and utilize them for LLM-powered operations directly on the user’s device.

Demo, code, more detailed description.
https://devquasar.com/llmaas/
https://github.com/csabakecskemeti/LLmaaS
https://youtu.be/OOWGr8jcP5Q

Call for contributors
Join me a develop the LLmaaS proxy to make this a generic purpose tool to leverage local LLMs on web. Build in security measures.
I'm looking for help to make the proxy more generic support multiple local LLM services without any change on the HTML side.
Also looking for ideas how to make the HTML par more modular and easy to use.
  • 4 replies
·
merve 
posted an update 11 days ago
view post
Post
3781
This week in open AI was 🔥 Let's recap! 🤗 merve/january-31-releases-679a10669bd4030090c5de4d
LLMs 💬
> Huge: AllenAI released new Tülu models that outperform DeepSeek R1 using Reinforcement Learning with Verifiable Reward (RLVR) based on Llama 3.1 405B 🔥
> Mistral AI is back to open-source with their "small" 24B models (base & SFT), with Apache 2.0 license 😱
> Alibaba Qwen released their 1M context length models Qwen2.5-Instruct-1M, great for agentic use with Apache 2.0 license 🔥
> Arcee AI released Virtuoso-medium, 32.8B LLMs distilled from DeepSeek V3 with dataset of 5B+ tokens
> Velvet-14B is a new family of 14B Italian LLMs trained on 10T tokens in six languages
> OpenThinker-7B is fine-tuned version of Qwen2.5-7B-Instruct on OpenThoughts dataset

VLMs & vision 👀
> Alibaba Qwen is back with Qwen2.5VL, amazing new capabilities ranging from agentic computer use to zero-shot localization 🔥
> NVIDIA released new series of Eagle2 models with 1B and 9B sizes
> DeepSeek released Janus-Pro, new any-to-any model (image-text generation from image-text input) with MIT license
> BEN2 is a new background removal model with MIT license!

Audio 🗣️
> YuE is a new open-source music generation foundation model, lyrics-to-song generation

Codebase 👩🏻‍💻
> We are open-sourcing our SmolVLM training and eval codebase! https://github.com/huggingface/smollm/tree/main/vision
> Open-R1 is open-source reproduction of R1 by @huggingface science team https://huggingface.co/blog/open-r1
  • 1 reply
·
ameerazam08 
posted an update 12 days ago
csabakecskemeti 
posted an update 13 days ago
AtAndDev 
posted an update 14 days ago
view post
Post
1841
everywhere i go i see his face
cfahlgren1 
posted an update 14 days ago
view post
Post
1906
If you haven't seen yet, we just released Inference Providers 🔀

> 4 new serverless inference providers on the Hub 🤯
> Use your HF API key or personal key with all providers 🔑
> Chat with Deepseek R1, V3, and more on HF Hub 🐋
> We support Sambanova, TogetherAI, Replicate, and Fal.ai 💪

Best of all, we don't charge any markup on top of the provider 🫰 Have you tried it out yet? HF Pro accounts get $2 of free usage for the provider inference.
Tonic 
posted an update 15 days ago
view post
Post
2865
🙋🏻‍♂️ Hey there folks ,

our team made a game during the @mistral-game-jam and we're trying to win the community award !

try our game out and drop us a ❤️ like basically to vote for us !

Mistral-AI-Game-Jam/TextToSurvive

hope you like it !
clem 
posted an update 15 days ago
view post
Post
7021
AI is not a zero-sum game. Open-source AI is the tide that lifts all boats!
burtenshaw 
posted an update 16 days ago
view post
Post
3107
Manic few days in open source AI, with game changing development all over the place. Here's a round up of the resources:

- The science team at @huggingface reproduced and open source the seek r1. https://github.com/huggingface/open-r1
- @qwen released a series of models with 1 million token context! https://qwenlm.github.io/blog/qwen2.5-1m/
- SmolVLM got even smaller with completely new variants at 256m and 500m https://huggingface.co/blog/smolervlm

There's so much you could do with these developments. Especially combining them together into agentic applications or fine-tuning them on your use case.
  • 1 reply
·
csabakecskemeti 
posted an update 17 days ago
view post
Post
2300
I've run the open llm leaderboard evaluations + hellaswag on deepseek-ai/DeepSeek-R1-Distill-Llama-8B and compared to meta-llama/Llama-3.1-8B-Instruct and at first glance R1 do not beat Llama overall.

If anyone wants to double check the results are posted here:
https://github.com/csabakecskemeti/lm_eval_results

Am I made some mistake, or (at least this distilled version) not as good/better than the competition?

I'll run the same on the Qwen 7B distilled version too.
·