Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

prithivMLmods

posted an update about 19 hours ago

Post

3698

Introducing the Super-OCRs Demo, a comparison of state-of-the-art multimodal OCR VLMs, including HunyuanOCR, DeepSeekOCR, Dots, and Nanonets in one space for performing OCR, rendering LaTeX and Markdown, and visual grounding (layout). Find the related Spaces and models below.🤗🔥

✨Super-OCRs[Demo]: prithivMLmods/Super-OCRs-Demo
✨GitHub: https://github.com/PRITHIVSAKTHIUR/Super-OCRs-Demo
✨Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

⭐ Models Used:
✦ HunyuanOCR: tencent/HunyuanOCR
✦ DeepSeek-OCR: (-) deepseek-ai/DeepSeek-OCR (+) prithivMLmods/DeepSeek-OCR-Latest-BF16.I64
✦ Dots.OCR: (-) rednote-hilab/dots.ocr (+) prithivMLmods/Dots.OCR-Latest-BF16
✦ Nanonets-OCR2-3B: nanonets/Nanonets-OCR2-3B

⭐ Some Other Relevant Apps:
✦ Qwen3-VL-HF-Demo: prithivMLmods/Qwen3-VL-HF-Demo
✦ Qwen3-VL-Outpost: prithivMLmods/Qwen3-VL-Outpost
✦ Multimodal-OCR: prithivMLmods/Multimodal-OCR
✦ Multimodal-OCR2: prithivMLmods/Multimodal-OCR2
✦ Multimodal-OCR3: prithivMLmods/Multimodal-OCR3
✦ DeepSeek-OCR-experimental: prithivMLmods/DeepSeek-OCR-experimental

To know more about it, visit the app info section or the respective Model Garden page!

Nymbo

posted an update 2 days ago

Post

4239

🚀 I've just shipped a major update to the Nymbo/Tools MCP server: the Agent_Terminal, a single "master tool" that cuts token usage by over 90%!

Anthropic found 98.7% context savings using code execution with MCP, Cloudflare published similar findings. This is my open-source implementation of the same idea.

# The Problem

Traditional MCP exposes every tool definition directly to the model. With 12 tools, that's thousands of tokens consumed *before the conversation even starts*. Each tool call also passes intermediate results through the context window — a 10,000-row spreadsheet? That's all going into context just to sum a column.

# The Solution: One Tool to Rule Them All

Agent_Terminal wraps all 12 tools (Web_Search, Web_Fetch, File_System, Generate_Image, Generate_Speech, Generate_Video, Deep_Research, Memory_Manager, Obsidian_Vault, Shell_Command, Code_Interpreter) into a single Python code execution gateway.

Instead of the model making individual tool calls, it writes Python code that orchestrates the tools directly:

# Search for Bitcoin price
result = Web_Search("current price of bitcoin", max_results=3)
print(result)

Don't know what tools are available? The agent can discover them at runtime:

print(search_tools('image'))  # Find tools by keyword
print(usage('Generate_Image'))  # Get full docs for a specific tool

The individual direct tool calls are all still there, but they can be disabled if using the Agent_Terminal. Try it now - https://www.nymbo.net/nymbot

1 reply

samerzaher80

posted an update 2 days ago

Post

1369

AetherMind_SRL: How I beat 7B models on MMLU with 184M params and a $300 GPU
I’m Sameer, a solo researcher from Iraq working on a single RTX 3050 8GB laptop.Today I’m releasing AetherMind_SRL – a 184M-parameter NLI model that was trained only on tasks (SNLI, MNLI, ANLI, and a small clinical Alzheimer’s dataset).
It was never fine-tuned or even shown a single MMLU question during training.Yet here are the zero-shot MMLU (57 subjects) results:Model
MMLU Zero-Shot
Training Data
AetherMind_SRL (me)
184M
36.05 %
Only NLI (SNLI/MNLI/ANLI + ADNI)
DeBERTa-v3-base
278M
~30.8 %
General pre-training
BERT-large
340M
27–30 %
General pre-training
LLaMA-1 7B
7B
34–35 %
Massive text corpus
LLaMA-2 7B
7B
~45 %
Bigger + better data

Yes – my 184M model beats every classic 300–400M model and the original 7-billion-parameter LLaMA-1, all while running at 300+ samples/sec on a $300 laptop GPU.How did this happen?I built a standardized self-improvement loop called AetherMind Self-Reflective Learning (SRL) v1.0:Train normally on NLI
Let the model predict on hard adversarial data (ANLI)
Log every mistake + low-confidence case
Build a balanced “SMART” buffer (60% errors + 40% correct anchors)
Fine-tune with tiny LR and error-weighted loss
Repeat until stable
That’s it. No external knowledge, no MMLU data, no cluster.
Just pure reasoning transfer from entailment/contradiction patterns → real-world knowledge.Try it yourself python
from transformers import pipeline
import torch

nli_pipeline = pipeline(
"text-classification",
model="samerzaher80/AetherMind_SRL",
device=0 if torch.cuda.is_available() else -1
)

# DEFINE YOUR TEST HERE
premise = "Patient shows progressive memory decline."
hypothesis = "Patient shows progressive memory decline."

input_text = f"{premise} [SEP] {hypothesis}"
result = nli_pipeline(input_text)[0]
print(f"Prediction: {result['label']}")
print(f"Confidence: {result['score']:
Model: samerzaher80/AetherMind_SRL

flozi00

posted an update 3 days ago

Post

3308

When models get too large for a single GPU, simply stacking layers vertically (Pipeline Parallelism) isn't always the answer. Sometimes, you need to slice the matrices themselves.

My latest guide breaks down the hardware mechanics of Tensor Parallelism (TP). We look at how to shard individual operations across devices to make a cluster function as one massive accelerator.

This isn't high-level theory—it is a look at the bare metal implementation.

Here is what is covered in the deep dive:

The Strategies: Column vs. Row Parallelism
We analyze how to split weight matrices (W) and inputs (X).

Column-Linear: Splits weights by columns. Requires an All-Gather to reconstruct the output.
Row-Linear: Splits weights by rows. Requires an All-Reduce to sum partial results.
The "Megatron-LM" Optimization
Efficiency comes from minimizing communication. By sandwiching the non-linearity (GeLU) between a Column-Parallel layer and a Row-Parallel layer, we can skip synchronization entirely during the activation phase. This cuts communication events by 50% per block.

The Hardware Reality: The Bandwidth Wall
In TP, the dist.all_reduce operation sits on the critical path. The CUDA cores effectively stall while waiting for the ring-reduce to finish.

Intra-Node: Works well because NVLink provides enough bandwidth to hide this latency.
Inter-Node: Fails at scale. Standard networking (Ethernet/InfiniBand) is too slow for the high-frequency syncs required by TP.
The article includes a raw PyTorch implementation using torch.distributed primitives to show exactly where the data moves and where the bottlenecks sit.

Read the full hardware-centric guide here:
https://flozi.net/en/guides/ai/scaling/tensor_parallel

MonsterMMORPG

posted an update 2 days ago

Post

5212

FLUX 2 vs FLUX SRPO, New FLUX Training Kohya SS GUI Premium App With Presets & Features : https://youtu.be/RQHmyJVOHXo

FLUX 2 has been published and I have compared it to the very best FLUX base model known as FLUX SRPO. Moreover, we have updated our FLUX Training APP and presets to the next level. Massive speed up gaings with 0 quality loss and lots of new features. I will show all of the new features we have with new SECourses Kohya SS GUI Premium app and compare FLUX SRPO trained model results with FLUX 2.

https://youtu.be/RQHmyJVOHXo

Get the SECourses Premium Kohya Trainer DreamBooth / Fine Tuning : [ https://www.patreon.com/posts/Kohya-FLUX-DreamBooth-Trainer-App-112099700 ]

Get the SECourses Premium Kohya Trainer LoRA : [ https://www.patreon.com/posts/Kohya-FLUX-LoRA-Trainer-App-110879657 ]

DreamBooth Training Tutorial: [ https://www.youtube.com/watch?v=FvpWy1x5etM ]

LoRA Training Tutorial: [ https://www.youtube.com/watch?v=nySGu12Y05k ]

Qwen Image Realism Tutorial: [ https://youtu.be/XWzZ2wnzNuQ ]

Join our Discord Community: [ https://discord.com/servers/secourses-Discord-772774097734074388 ]

⏱️ Video Chapters:
0:00 Introduction to New FLUX Training Improvements and Local Training Showcase
0:24 Understanding FLUX SRPO Model: High Realism with Minimal VRAM Requirements
0:38 Updated Configurations for Training Realism on 6GB VRAM GPUs Locally
1:07 FLUX 2 Announcement and Setting Up Comparisons with BFL Playground
1:45 FLUX 2 Dev Model Technical Specs: 32 Billion Parameters and Hardware Challenges
2:11 Overview of Changes in SECourses Premium Kohya Trainer Version 35
2:46 Development Updates: GUI Improvements and Full Torch Compile Support
3:13 LoRA Presets Update: VRAM Optimization and Speed Improvements via Torch Compile
3:27 Introducing On-the-Fly FP8 Scaled LoRA Training Support
3:42 Quality Comparison Analysis: BF16 vs FP8 Scaled Weights LoRA
4:24 VRAM Usage and Speed Analysis: Block Swap Count Reduction with FP8 Scaled
....

3 replies

ronantakizawa

posted an update 4 days ago

Post

2370

Introducing the japanese-trending-words dataset: a dataset consisting 593 words from Japan’s annual trending word rankings (流行語大賞) from 2006-2025. This dataset provides the top 30 words from each year and its meaning in Japanese and english. This resource is awesome for NLP tasks understanding recent Japanese culture and history.

ronantakizawa/japanese-trending-words

#japanese #japanesedataset #trending

Babsie

posted an update about 11 hours ago

Post

162

Accidentally made a totally blank, non-functioning HF Space while poking “what does this do” buttons without reading anything.
I’d already named it its-totally-supposed-to-do-that.
It is, in fact, totally supposed to do that. Trickster QA passed.

sergiopaniego

posted an update about 17 hours ago

Post

229

nanochat is now in transformers!

The LLM by @karpathy is officially in the library, and we wrote a blog covering: how did we port the model, differences from the original, and how to run or train it.

go read it 🤓

nanochat-students/transformers

anakin87

posted an update about 17 hours ago

Post

167

I made a visualization based on the Prime Intellect INTELLECT-3 technical report.

Wild to see how far they pushed GLM-4.5-Air-Base with SFT + RL.
SOTA for its size and competitive with models 3x larger.

All open.

Congrats on the release!

Model: PrimeIntellect/INTELLECT-3
Technical report: https://storage.googleapis.com/intellect-3-paper/INTELLECT_3_Technical_Report.pdf
Chat: https://chat.primeintellect.ai/

wang12390

posted an update about 17 hours ago

Post

113

How To Use Miragic V1.1 Image Generator for Marketing.

Hello, I was trying to generate face recognition video for social marketing as our company is biometrics company, so I was in need of image materials, used some services for image generation but this Miragic V1.1 Image Generator was best.
If you are also interested, please visit https://miragic.ai/products/image-generator

Recently active users