PZ's picture

PZ PRO

philipp-zettl

AI & ML interests

NLP/CV/Multimodal learning

Recent Activity

updated a model about 13 hours ago
philipp-zettl/T5-small-tinyqa
published a model about 13 hours ago
philipp-zettl/T5-small-tinyqa
liked a Space about 18 hours ago
elismasilva/mixture-of-diffusers-sdxl-tiling
View all activity

Organizations

Blog-explorers's profile picture easybits's profile picture

philipp-zettl's activity

reacted to schuler's post with πŸ”₯ 1 day ago
view post
Post
6083
πŸ“’ New Research Alert: Making Language Models Smaller & Smarter!

Thrilled to share the latest technical report demonstrating how to reduce language model parameters by 77% while maintaining performance.

The secret? Grouped pointwise convolutions. Yes. We brought a method from computer vision to the transformers arena.

πŸ”‘ Key Findings:
β€’ 77% parameter reduction.
β€’ Maintained model capabilities.
β€’ Improved generalization.

Paper: https://www.researchgate.net/publication/388835829_SAVING_77_OF_THE_PARAMETERS_IN_LARGE_LANGUAGE_MODELS_TECHNICAL_REPORT
Code: https://github.com/joaopauloschuler/less-parameters-llm
reacted to hexgrad's post with πŸ”₯ 12 days ago
reacted to mitkox's post with πŸš€ 15 days ago
view post
Post
2262
llama.cpp is 26.8% faster than ollama.
I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison.

Total duration:
llama.cpp 6.85 sec <- 26.8% faster
ollama 8.69 sec

Breakdown by phase:
Model loading
llama.cpp 241 ms <- 2x faster
ollama 553 ms

Prompt processing
llama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x faster
ollama 42.17 tokens/s with an eval time of 498 ms

Token generation
llama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% faster
ollama 122.07 tokens/s with an eval time 7.64 sec

llama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.
Β·
reacted to fdaudens's post with ❀️ 15 days ago
view post
Post
8306
Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:

- Original release: 8 models, 540K downloads. Just the beginning...

- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5Mβ€”nearly 5X the originals.

The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.

When you empower builders, innovation explodes. For everyone. πŸš€

The most popular community model? @bartowski 's DeepSeek-R1-Distill-Qwen-32B-GGUF version β€” 1M downloads alone.
Β·
reacted to lewtun's post with πŸš€ about 2 months ago
view post
Post
6854
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute πŸ”₯

How? By combining step-wise reward models with tree search algorithms :)

We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"

We're open sourcing the full recipe and sharing a detailed blog post.

In our blog post we cover:

πŸ“ˆ Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.

πŸŽ„ Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.

🧭 Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM

Here's the links:

- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute

- Code: https://github.com/huggingface/search-and-learn

Enjoy!
  • 2 replies
Β·
reacted to lhoestq's post with ❀️ about 2 months ago
view post
Post
1830
Made a HF Dataset editor a la gg sheets here: lhoestq/dataset-spreadsheets

With Dataset Spreadsheets:
✏️ Edit datasets in the UI
πŸ”— Share link with collaborators
🐍 Use locally in DuckDB or Python

Available for the 100,000+ parquet datasets on HF :)
reacted to merve's post with ❀️ 2 months ago
view post
Post
5630
This week in open-source AI was insane 🀠 A small recapπŸ•ΊπŸ» merve/dec-6-releases-67545caebe9fc4776faac0a3

Multimodal πŸ–ΌοΈ
> Google shipped a PaliGemma 2, new iteration of PaliGemma with more sizes: 3B, 10B and 28B, with pre-trained and captioning variants πŸ‘
> OpenGVLab released InternVL2, seven new vision LMs in different sizes, with sota checkpoint with MIT license ✨
> Qwen team at Alibaba released the base models of Qwen2VL models with 2B, 7B and 72B ckpts

LLMs πŸ’¬
> Meta released a new iteration of Llama 70B, Llama3.2-70B trained further
> EuroLLM-9B-Instruct is a new multilingual LLM for European languages with Apache 2.0 license πŸ”₯
> Dataset: CohereForAI released GlobalMMLU, multilingual version of MMLU with 42 languages with Apache 2.0 license
> Dataset: QwQ-LongCoT-130K is a new dataset to train reasoning models
> Dataset: FineWeb2 just landed with multilinguality update! πŸ”₯ nearly 8TB pretraining data in many languages!

Image/Video Generation πŸ–ΌοΈ
> Tencent released HunyuanVideo, a new photorealistic video generation model
> OminiControl is a new editing/control framework for image generation models like Flux

Audio πŸ”Š
> Indic-Parler-TTS is a new text2speech model made by community
reacted to christopher's post with πŸ”₯ 2 months ago
posted an update 2 months ago
view post
Post
483
alias rm='rm -i'


Better be safe than sorry.
reacted to andito's post with πŸ”₯ 2 months ago
view post
Post
1958
SmolVLM speeding locally on a laptop thanks to mlx-vlm and
@Gradio ! Try it with two lines:
pip install git+https://github.com/andimarafioti/mlx-vlm.git@stream-generate-fix
python -m mlx_vlm.chat_ui --model mlx-community/SmolVLM-Instruct-8bit

Gotta love the MLX community! Big thanks to @pcuenq and @prince_canuma !
reacted to merve's post with ❀️ 3 months ago
view post
Post
3252
your hugging face profile now has your recent activities πŸ€—
replied to their post 4 months ago
replied to their post 4 months ago
view reply

I think you got me wrong there. I'm mostly concerned about image generation LoRAs that are trained on your person or for instance the pictures of children.
Gate keeping the secret sauce for base models is different and I totally agree with you on that part.

replied to their post 4 months ago
view reply

I'm more concerned about bad actors using them to create content that might harm you or put you in a bad spot by creating visual content with your face.
For instance to blackmail you or harm your reputation.

I am for sure a big supporter of open source and publish all the things I have the rights to. Yet, I wouldn't publish a LoRA that is trained on my face.

posted an update 4 months ago
view post
Post
896
This is probably a very hot take, but here goes nothing.

With the incredibly accurate LoRAs we see emerge for high quality models like FLUX from services like fal.ai that offer training within single digit minutes, e.g. 2 min per 1000 iterations.

Why the hell are people publishing private LoRAs as public models?!
Take a look at this listing: https://huggingface.co/models?other=base_model:adapter:black-forest-labs%2FFLUX.1-dev&sort=created

I would expect that people that hold a HF account have some kind of forward thinking. Heck, do you really want to give anyone the power to create ultra realistic images of yourself?!

Didn't we learn anything from social media?
I am puzzled..
Β·
reacted to clem's post with ❀️ 4 months ago
view post
Post
4168
Open-source AI creates healthy competition in a field where natural tendencies lead to extreme concentration of power. Imagine a world where only one or two companies could build software. This is the biggest risk and ethical challenge of them all IMO. Let's fight this!
  • 3 replies
Β·
reacted to reach-vb's post with πŸ”₯ 4 months ago
view post
Post
5510
Multimodal Ichigo Llama 3.1 - Real Time Voice AI πŸ”₯

> WhisperSpeech X Llama 3.1 8B
> Trained on 50K hours of speech (7 languages)
> Continually trained on 45hrs 10x A1000s
> MLS -> WhisperVQ tokens -> Llama 3.1
> Instruction tuned on 1.89M samples
> 70% speech, 20% transcription, 10% text
> Apache 2.0 licensed ⚑

Architecture:
> WhisperSpeech/ VQ for Semantic Tokens
> Llama 3.1 8B Instruct for Text backbone
> Early fusion (Chameleon)

I'm super bullish on HomeBrew/ Jan and early fusion, audio and text, multimodal models!

(P.S. Play with the demo on Hugging Face: jan-hq/Ichigo-llama3.1-s-instruct)
reacted to tomaarsen's post with πŸ”₯ 4 months ago
view post
Post
7040
πŸ“£ Sentence Transformers v3.2.0 is out, marking the biggest release for inference in 2 years! 2 new backends for embedding models: ONNX (+ optimization & quantization) and OpenVINO, allowing for speedups up to 2x-3x AND Static Embeddings for 500x speedups at 10-20% accuracy cost.

1️⃣ ONNX Backend: This backend uses the ONNX Runtime to accelerate model inference on both CPU and GPU, reaching up to 1.4x-3x speedup depending on the precision. We also introduce 2 helper methods for optimizing and quantizing models for (much) faster inference.
2️⃣ OpenVINO Backend: This backend uses Intel their OpenVINO instead, outperforming ONNX in some situations on CPU.

Usage is as simple as SentenceTransformer("all-MiniLM-L6-v2", backend="onnx"). Does your model not have an ONNX or OpenVINO file yet? No worries - it'll be autoexported for you. Thank me later πŸ˜‰

πŸ”’ Another major new feature is Static Embeddings: think word embeddings like GLoVe and word2vec, but modernized. Static Embeddings are bags of token embeddings that are summed together to create text embeddings, allowing for lightning-fast embeddings that don't require any neural networks. They're initialized in one of 2 ways:

1️⃣ via Model2Vec, a new technique for distilling any Sentence Transformer models into static embeddings. Either via a pre-distilled model with from_model2vec or with from_distillation where you do the distillation yourself. It'll only take 5 seconds on GPU & 2 minutes on CPU, no dataset needed.
2️⃣ Random initialization. This requires finetuning, but finetuning is extremely quick (e.g. I trained with 3 million pairs in 7 minutes). My final model was 6.6% worse than bge-base-en-v1.5, but 500x faster on CPU.

Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.2.0
Documentation on Speeding up Inference: https://sbert.net/docs/sentence_transformer/usage/efficiency.html
  • 1 reply
Β·
posted an update 4 months ago
view post
Post
1428
πŸš€ Finishing up the prototype of my weekend project called ChessPT πŸš€

- The game state is now being rendered. This simplifies coming up with own new moves
- The model space philipp-zettl/ChessPT was updated to provide an interactive mode.
- The space is currently running v0.4 of philipp-zettl/chessPT
- New updates will come this week.
- Training runs will be logged under https://wandb.ai/philipp-zettl/chessPT/

**Note**: The model is still not performing on a level that I want it to. It predicts too frequently invalid moves (according to the game state). In addition to that the post-processing step is a little faulty, so it might be possible that you end up in a state where the model didn't provide a next move.
posted an update 4 months ago
view post
Post
595
Version 0.2a of ChessPT is currently training.

I decided to wait with the actual v1.0 until I have a better understanding where I want to go and successfully trained the first fine tune.

I'm playing around with a loss that is highly influenced by the idea of reinforcement.

Basically I'm punishing the model for generating invalid PGN strings.
The current approach sets on simplicity

-2: wrong characters in output
-1: invalid PGN string, but valid charset
0: valid PGN string, incl. valid moves


GPT-4o helped me with the implementation. I'm expecting some errors in the implementation.

The training should finish in somewhat 14h, I will upload the new weights then.
But I still need to run extensive tests on this loss before I can happily call it v0.2 ✌️

BTW, I'm also building a space for the model which will be published tonight after adding descriptions and a nice interface. β™ŸοΈ

philipp-zettl/chessPT
philipp-zettl/ChessPT