Doron Adler PRO

Norod78

AI & ML interests

Fooling around with Generative machine learning models.

Recent Activity

liked a model about 12 hours ago
ivrit-ai/whisper-large-v3-turbo-ggml
liked a dataset about 14 hours ago
teknium/OpenHermes-2.5
liked a Space 1 day ago
ginigen/text3d-r1
View all activity

Organizations

Spaces-explorers's profile picture Gradio-Blocks-Party's profile picture Yam Peleg's profile picture ZeroGPU Explorers's profile picture Mixtiles's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Endless Technologies Ltd. 's profile picture

Norod78's activity

reacted to schuler's post with šŸ‘ 1 day ago
view post
Post
6010
šŸ“¢ New Research Alert: Making Language Models Smaller & Smarter!

Thrilled to share the latest technical report demonstrating how to reduce language model parameters by 77% while maintaining performance.

The secret? Grouped pointwise convolutions. Yes. We brought a method from computer vision to the transformers arena.

šŸ”‘ Key Findings:
ā€¢ 77% parameter reduction.
ā€¢ Maintained model capabilities.
ā€¢ Improved generalization.

Paper: https://www.researchgate.net/publication/388835829_SAVING_77_OF_THE_PARAMETERS_IN_LARGE_LANGUAGE_MODELS_TECHNICAL_REPORT
Code: https://github.com/joaopauloschuler/less-parameters-llm
upvoted an article 3 days ago
view article
Article

Hugging Face partners with Wiz Research to Improve AI Security

ā€¢ 14
reacted to grimjim's post with šŸ‘ 3 days ago
view post
Post
2245
I've made yet another merge of reasoning models with incremental gains on the current Open LLM leaderboard.
open-llm-leaderboard/open_llm_leaderboard

Merging in DeepSeek R1 distillation to Llama 3.1 8B (at 10% task arithmetic weight, using the Llama 3.1 8B base model as the case rather than the instruct model) with a prior best merge resulted in a slightly lower IFEval, but a higher result in every other benchmark save for MMLU-PRO, which went down only marginally. MATH Lvl5 and GPQA went up palpably.
grimjim/DeepSauerHuatuoSkywork-R1-o1-Llama-3.1-8B

This result is currently my best Llama 3.1 8B merge result to date. The actual R1 distillation itself scored quite badly, so this would seem to be another case of unexpected formatting (reflected in IFEval) hurting the evaluation results, obscuring the strength of a model.

It is also possible to use the text generation feature of this model to generate roleplay completions. Based on informal testing, this model's bias toward problem-solving will subtly impact narration.
New activity in apple/coreml-mobileclip 3 days ago
reacted to merve's post with šŸš€ 5 days ago