1 4 22

Danny

TheDrunkenSnail

AI & ML interests

None yet

Recent Activity

reacted to eaddario's post with 👍 11 days ago

Squeezing out tensor bits? I have been tinkering with quantization and pruning to reduce model sizes. So far, I've had modest success in producing, on average, 8% smaller versions with negligible loss of quality, and I think further reductions in the 10-15% range are realistic, but I've come across a behaviour I wasn't expecting! Part of the process I'm following consists of quantizing the embedding and output layers aggressively. Since the embedding layer is more about lookup than complex computation, the vectors representing the relative distances between embeddings are usually preserved well enough making this layer fairly robust to quantization. So far, so good. The output layer, on the other hand, maps the final hidden state to the vocabulary logits and therefore, small changes in these logits could lead to a different probability distribution over the vocabulary, resulting in incorrect word predictions, or so I thought. Surprisingly, I'm finding that even at Q2_K the loss of overall capability is minimal. Was this to be expected? or am I missing something? I have published a version with all the test results if you want to give it a try: https://huggingface.co/eaddario/DeepSeek-R1-Distill-Qwen-7B-GGUF I'll upload other models as time allows. Any ideas / clarifications / suggestions are very much welcomed!

liked a model 17 days ago

DavidAU/L3-Stheno-Maid-Blackroot-Grand-HORROR-16.5B-V1.5-STABLE

reacted to kadirnar's post with 👀 about 1 month ago

Researchers developed Sonic AI enabling precise facial animation from speech cues 🎧 Decouples head/expression control via audio tone analysis + time-aware fusion for natural long-form synthesis

View all activity

Organizations

TheDrunkenSnail's activity

reacted to eaddario's post with 👍 11 days ago

Post

2726

Squeezing out tensor bits?

I have been tinkering with quantization and pruning to reduce model sizes. So far, I've had modest success in producing, on average, 8% smaller versions with negligible loss of quality, and I think further reductions in the 10-15% range are realistic, but I've come across a behaviour I wasn't expecting!

Part of the process I'm following consists of quantizing the embedding and output layers aggressively. Since the embedding layer is more about lookup than complex computation, the vectors representing the relative distances between embeddings are usually preserved well enough making this layer fairly robust to quantization. So far, so good.

The output layer, on the other hand, maps the final hidden state to the vocabulary logits and therefore, small changes in these logits could lead to a different probability distribution over the vocabulary, resulting in incorrect word predictions, or so I thought.

Surprisingly, I'm finding that even at Q2_K the loss of overall capability is minimal. Was this to be expected? or am I missing something?

I have published a version with all the test results if you want to give it a try: eaddario/DeepSeek-R1-Distill-Qwen-7B-GGUF

I'll upload other models as time allows.

Any ideas / clarifications / suggestions are very much welcomed!

4 replies

liked a model 17 days ago

DavidAU/L3-Stheno-Maid-Blackroot-Grand-HORROR-16.5B-V1.5-STABLE

Text Generation • Updated Nov 22, 2024 • 25 • 2

reacted to kadirnar's post with 👀 about 1 month ago

Post

3902

Researchers developed Sonic AI enabling precise facial animation from speech cues 🎧 Decouples head/expression control via audio tone analysis + time-aware fusion for natural long-form synthesis

1 reply

liked a model about 1 month ago

unsloth/Mistral-Small-24B-Base-2501

Text Generation • Updated Feb 2 • 1.87k • 3

reacted to mkurman's post with 👍 about 1 month ago

Post

2066

Blurred-Thoughts Supervised Fine-Tuning (BT-SFT) 🤖

Can we teach a model to think completely on its own without reinforcement learning? Actually, yes.

We can do straightforward supervised fine-tuning using a relatively simple trick: blurring a part of CoT thoughts. But why is this effective?

We observed that various models differ in their thinking processes, and fine-tuning one model on another model’s thoughts (CoT) can sometimes be inefficient—often resulting in the model simply memorizing reasoning rather than learning how to actually think.

I discovered that this process can still be efficient if we clearly indicate when the model should start and stop thinking and uncover only a part of CoT and the expected answer, blurring the other part of CoT. This approach allows the model to learn only a portion of the thought process while still arriving at an expected answer.

To demonstrate this, you can watch my experimental BT-SFT on meditsolutions/Llama-3.2-SUN-2.5B-chat model, which was fine-tuned on 151 million tokens from the Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B dataset.

Enjoy! 🚀

PS. If you were curious enough to read this, leave me a comment. It's always nice to chat with open-minded and intelligent ppl.

3 replies

reacted to Kseniase's post with 🚀 about 2 months ago

Post

3105

7 Open-source Methods to Improve Video Generation and Understanding

AI community is making great strides toward achieving the full potential of multimodality in video generation and understanding. Last week studies showed that working with videos is now one of the main focuses for improving AI models. Another highlight of the week is that open source, once again, proves its value. For those who were impressed by DeepSeek-R1, we’re with you!

Today, we’re combining these two key focuses and bringing you a list of open-source methods for better video generation and understanding:

1. VideoLLaMA 3 model: Excels in various video and image tasks thanks to vision-centric training approach. VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding (2501.13106)

2. FILMAGENT framework assigns roles to multiple AI agents, like a director, screenwriter, actor, and cinematographer, to automate the filmmaking process in 3D virtual environments. FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces (2501.12909)

3. Improving Video Generation with Human Feedback (2501.13918) proposes a new VideoReward Model and approach that uses human feedback to refine video generation models.

4. DiffuEraser video inpainting model, based on stable diffusion, is designed to fill in missing areas with detailed, realistic content and to ensure consistent structures across frames. DiffuEraser: A Diffusion Model for Video Inpainting (2501.10018)

5. MAGI is a hybrid video gen model that combines masked and casual modeling. Its key innovation, Complete Teacher Forcing (CTF), conditions masked frames on fully visible frames. Taming Teacher Forcing for Masked Autoregressive Video Generation (2501.12389)

6. Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise (2501.08331) proposes motion control, allowing users to guide how objects or the camera move in generated videos. Its noise warping algorithm replaces random noise in videos with structured noise based on motion info.

7. Video Depth Anything model estimates depth consistently in super-long videos (several minutes or more) without sacrificing quality or speed. Video Depth Anything: Consistent Depth Estimation for Super-Long Videos (2501.12375)

1 reply

reacted to KnutJaegersberg's post with 👀 about 2 months ago

Post

1887

Evolution and The Knightian Blindspot of Machine Learning

The paper discusses machine learning's limitations in addressing Knightian Uncertainty (KU), highlighting the fragility of models like reinforcement learning (RL) in unpredictable, open-world environments. KU refers to uncertainty that can't be quantified or predicted, a challenge that RL fails to handle due to its reliance on fixed data distributions and limited formalisms.

### Key Approaches:

1. **Artificial Life (ALife):** Simulating diverse, evolving systems to generate adaptability, mimicking biological evolution's robustness to unpredictable environments.

2. **Open-Endedness:** Creating AI systems capable of continuous innovation and adaptation, drawing inspiration from human creativity and scientific discovery.

3. **Revising RL Formalisms:** Modifying reinforcement learning (RL) models to handle dynamic, open-world environments by integrating more flexible assumptions and evolutionary strategies.

These approaches aim to address ML’s limitations in real-world uncertainty and move toward more adaptive, general intelligence.

https://arxiv.org/abs/2501.13075

reacted to fantos's post with 🔥 about 2 months ago

Post

4270

🚀 HuggingFace Spaces Ranking Tracker - Your Complete AI Trend Analytics!

Introducing the Spaces Ranking Tracker, a comprehensive analytics dashboard that tracks and analyzes every AI application in the HuggingFace ecosystem.

✨ Key Features:
• Real-time tracking of daily ranking changes over 30 days
• Detailed analysis of top 100 trending spaces
• User-based integrated score visualization
• One-click access to space details
• Interactive rank change graphs

📊 Dashboard Components:
1. Main Dashboard
- Daily rank trend graphs
- Top 20 creators' combined score chart
- Detailed space information cards
- Real-time trending score updates

2. Space Detailed Analysis
- Creation date, current rank, and trending score
- 30-day ranking history
- Direct space access
- Custom color coding for intuitive rank display

🎯 How to Use:
• Monitor latest AI community trends
• Track your project's performance
• Discover popular AI demos
• Analyze competing projects
• Follow AI ecosystem dynamics

3. Interactive Features
- Custom filtering options
- Sorting by various metrics
- Detailed performance statistics
- Comprehensive trending scores
- Historical data tracking

Stay on top of every movement in the HuggingFace ecosystem with daily ranking updates! 👉 Try it now!

🔗 Access Dashboard: fantos/Ranking-Tracker
#HuggingFace #AI #DataVisualization #TrendAnalysis #AITrends

1 reply

liked a dataset about 2 months ago

nbeerbower/GreatFirewall-DPO

Viewer • Updated 12 days ago • 492 • 269 • 8

reacted to bartowski's post with 👍 about 2 months ago

Post

64427

Looks like Q4_0_N_M file types are going away

Before you panic, there's a new "preferred" method which is online (I prefer the term on-the-fly) repacking, so if you download Q4_0 and your setup can benefit from repacking the weights into interleaved rows (what Q4_0_4_4 was doing), it will do that automatically and give you similar performance (minor losses I think due to using intrinsics instead of assembly, but intrinsics are more maintainable)

You can see the reference PR here:

https://github.com/ggerganov/llama.cpp/pull/10446

So if you update your llama.cpp past that point, you won't be able to run Q4_0_4_4 (unless they add backwards compatibility back), but Q4_0 should be the same speeds (though it may currently be bugged on some platforms)

As such, I'll stop making those newer model formats soon, probably end of this week unless something changes, but you should be safe to download and Q4_0 quants and use those !

Also IQ4_NL supports repacking though not in as many shapes yet, but should get a respectable speed up on ARM chips, PR for that can be found here: https://github.com/ggerganov/llama.cpp/pull/10541

Remember, these are not meant for Apple silicon since those use the GPU and don't benefit from the repacking of weights

16 replies

reacted to burtenshaw's post with 🚀 about 2 months ago

Post

3336

Manic few days in open source AI, with game changing development all over the place. Here's a round up of the resources:

- The science team at @huggingface reproduced and open source the seek r1. https://github.com/huggingface/open-r1
- @qwen released a series of models with 1 million token context! https://qwenlm.github.io/blog/qwen2.5-1m/
- SmolVLM got even smaller with completely new variants at 256m and 500m https://huggingface.co/blog/smolervlm

There's so much you could do with these developments. Especially combining them together into agentic applications or fine-tuning them on your use case.

1 reply

updated a collection about 2 months ago

12B Models

Collection

12B Models • 3 items • Updated Jan 26 • 1

reacted to haritzpuerto's post with 👍 about 2 months ago

Post

1470

I'm excited to announce that my internship paper at Parameter Lab was accepted to Findings of #NAACL2025 🎉
TLDR: Stating an LLM was trained on a sentence might not be possible 😥 , but it is possible for large enough amounts of tokens, such as long documents or multiple documents! 🤯
Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models (2411.00154)
🔗 https://github.com/parameterlab/mia-scaling