17 3 19

Mariusz Kurman PRO

mkurman

AI & ML interests

AI Tech Lead | MD

Recent Activity

updated a model about 4 hours ago

mkurman/Llama-3.2-MedIT-3B-R1

published a model about 7 hours ago

mkurman/Llama-3.2-MedIT-3B-R1

reacted to CultriX's post with 👍 about 13 hours ago

Final upgrade to the Multi-Agent Task Completion Space: https://huggingface.co/spaces/CultriX/MultiAgent-CodeTask . It now includes : - a live stream of the progress being made on the task (see included video), - The following components: 1. Automatic prompt optimization 2. An orchestrator deciding which agent to call dynamically including feedback from a human (human-in-the-loop) 3. A coding agent to complete the task 4. A code reviewing agent to iteratively provide feedback to improve the code generated by the coding agent until the code meets the required criteria after which it is approved. 5. A testing agent that tests the approved code or provides information on how to test it. 6. A documentation agent that provides documentation and a help message for the approved and tested code.

View all activity

Organizations

mkurman's activity

updated a model about 4 hours ago

mkurman/Llama-3.2-MedIT-3B-R1

Updated about 4 hours ago

published a model about 7 hours ago

mkurman/Llama-3.2-MedIT-3B-R1

Updated about 4 hours ago

reacted to CultriX's post with 👍❤️ about 13 hours ago

Post

1593

Final upgrade to the Multi-Agent Task Completion Space: CultriX/MultiAgent-CodeTask .

It now includes :
- a live stream of the progress being made on the task (see included video),
- The following components:
1. Automatic prompt optimization
2. An orchestrator deciding which agent to call dynamically including feedback from a human (human-in-the-loop)
3. A coding agent to complete the task
4. A code reviewing agent to iteratively provide feedback to improve the code generated by the coding agent until the code meets the required criteria after which it is approved.
5. A testing agent that tests the approved code or provides information on how to test it.
6. A documentation agent that provides documentation and a help message for the approved and tested code.

posted an update 1 day ago

Post

1472

I've been working on something cool: a GRPO with an LLM evaluator that can also perform SFT on the feedback data - if you want. Check it out 😊

Any 🌟are more than welcome 🤗

https://github.com/mkurman/grpo-llm-evaluator

posted an update 7 days ago

Post

1523

Blurred-Thoughts Supervised-Finetuning 🙈

After hours of working with GitHub Copilot to organize the code, I'm keen to announce the release of Blurred Thoughts Supervised-Finetuning (BT-SFT), a new method for fine-tuning LLMs to produce more diverse and creative responses.

BT-SFT introduces:
✅ Smart tokenization method randomly masks tokens within <think> ... </think> tags, promoting the model to generate diverse responses that align better with its probability distribution instead of memorizing the thought process from distilled data.
✅ Reward function that ensures responses are well-structured.

Explore and contribute to the project available in my GitHub repository:
https://github.com/mkurman/blurred-thoughts-SFT

Keep me updated on your experiments with BT-SFT! 🐐

updated a model 7 days ago

mkurman/Llama-3.2-MedIT-SUN-2.5B-BT-GRPO

Updated 7 days ago • 177 • 2

updated a dataset 7 days ago

mkurman/simplescaling-s1K-R1

Viewer • Updated 7 days ago • 1k • 32

published a dataset 7 days ago

mkurman/simplescaling-s1K-R1

Viewer • Updated 7 days ago • 1k • 32

liked a dataset 7 days ago

simplescaling/s1K

Viewer • Updated 4 days ago • 1k • 3.06k • 167

New activity in mkurman/Llama-3.2-MedIT-SUN-2.5B-BT-GRPO 8 days ago

Issue with Padding

#1 opened 10 days ago by

akashD22

reacted to nicolay-r's post with 🔥 12 days ago

Post

1613

📢 The LLaMA-3.1-8B distilled 8B version of the R1 DeepSeek AI is available besides the one based on Qwen

📙 Notebook for using it in reasoning over series of data 🧠 :
https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_deep_seek_7b_distill_llama3.ipynb

Loading using the pipeline API of the transformers library:
https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/transformers_llama.py
🟡 GPU Usage: 12.3 GB (FP16/FP32 mode) which is suitable for T4. (a 1.5 GB less than Qwen-distilled version)
🐌 Perfomance: T4 instance: ~0.19 tokens/sec (FP32 mode) and (FP16 mode) ~0.22-0.30 tokens/sec. Is it should be that slow? 🤔
Model name: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
⭐ Framework: https://github.com/nicolay-r/bulk-chain
🌌 Notebooks and models hub: https://github.com/nicolay-r/nlp-thirdgate

published a model 12 days ago

mkurman/Llama-3.2-MedIT-SUN-2.5B-BT-GRPO

Updated 7 days ago • 177 • 2

reacted to fuzzy-mittenz's post with 😎🤗👀🔥🚀❤️ 13 days ago

Post

2609

Not many seemed to notice but what was probably meant to be a WIN for artist's rights in the US Office of Copyright has solved some fundamental issues for the community.
In our recent article I outline how Companies like Suno, OpenAI, Midjourney etc can no longer claim any right to copy your work that you create with their platforms
We also look at other ways this study and new rules for AI will fundamentally effect creators who use it and companies incentives to give them control over certain aspects might change because of this. it's broken down pretty well here: https://huggingface.co/blog/fuzzy-mittenz/copyright-in-ai

replied to Jaward's post 13 days ago

Yeah, the fun part is that I use any QA dataset in GRPO just by instructing a model to follow simple rules. Place your answer in \boxed{} or ** ** tags. I do a regex, and it simply works.