Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

posted an update 2 days ago

Post

2797

Why This Matters — David Defeats Goliath

MODEL: FINAL-Bench/Darwin-4B-David
SPACE: FINAL-Bench/Darwin-4B-david

We're releasing Darwin-4B-David, the first second-generation model in the Darwin Opus family. By evolving an already-evolved model, it achieves 85.0% on GPQA Diamond — surpassing its 58.6% original ancestor and even gemma-4-31B (84.3%) — with just 4.5B parameters.

Second-Generation Evolution
Most merges start from a base model and produce a single offspring. Darwin-4B-David breaks this pattern. The Father (Darwin-4B-Opus) was already evolved from gemma-4-E4B-it with Claude Opus reasoning distillation — a Gen-1 model. The Mother (DavidAU's DECKARD-Expresso-Universe) brings Unsloth deep tuning across 5 in-house datasets with thinking mode by default. Crossbreeding these two produced the first Gen-2 Darwin model.

Darwin V6's Model MRI scanned both parents across all 42 layers, assigning independent optimal ratios per layer. The Mother's creativity and Korean language hotspot (Layer 22-25, weight 0.95) was maximally absorbed, while the Father's reasoning core (Layer 30-40, weight 0.48) was preserved. This is "Merge = Evolve" applied recursively — evolution of evolution.

Benchmarks
Darwin-4B-David scores 85.0% on GPQA Diamond (+26.4%p over original 58.6%), evaluated generatively with maj@8 (8 generations per question, majority vote), Epoch AI prompt format, thinking mode enabled, 50 sampled questions. On ARC-Challenge (25-shot, loglikelihood), both score 64.93% — expected, as loglikelihood doesn't capture thinking-mode reasoning differences.

Why This Matters
gemma-4-31B (30.7B) scores 84.3%. Darwin-4B-David surpasses it at 1/7th the size — no training, no RL, just 45 minutes of MRI-guided DARE-TIES on one H100. The name "David" honors Mother creator DavidAU and evokes David vs. Goliath.

Juanxi

posted an update 1 day ago

Post

2475

📢 Awesome Multimodal Modeling

We introduce Awesome Multimodal Modeling, a curated repository tracing the architectural evolution of multimodal intelligence—from foundational fusion to native omni-models.

🔹 Taxonomy & Evolution:

Traditional Multimodal Learning – Foundational work on representation, fusion, and alignment.
Multimodal LLMs (MLLMs) – Architectures connecting vision encoders to LLMs for understanding.
Unified Multimodal Models (UMMs) – Models unifying Understanding + Generation via Diffusion, Autoregressive, or Hybrid paradigms.
Native Multimodal Models (NMMs) – Models trained from scratch on all modalities; contrasts early vs. late fusion under scaling laws.
💡 Key Distinction:
UMMs unify tasks via generation heads; NMMs enforce interleaving through joint pre-training.

🔗 Explore & Contribute: https://github.com/OpenEnvision-Lab/Awesome-Multimodal-Modeling

2 replies

philipp-zettl

posted an update 2 days ago

Post

2460

I've been cooking something neat over the past weeks 👨‍🍳

We all know that training LLMs requires a lot of resources and especially a lot of compute in form of GPUs, or is super slow and inefficient when done on CPUs.

The big players use giant clusters of Nvidia H100s.
But if I look at the profiles of my fellow home brewers, all we can get our hands on are those pesky consumer RTX's. If you're lucky you got yourself a 5080 with 16GB VRAM or something.

To be frank, I don't have that 1.3k disposable cash laying around ¯\_(ツ)_/¯
But I can write rust and like building ML libraries.

So I asked myself the question(s):
- can I train SMLs at home on my hardware?
- How hard can it be to build a ML library that can stream data between RAM and VRAM on demand, like llama.cpp's unified memory feature [^1]?
- how hard can it be to implement bf16 support?

The answers are wild, trust me!

Image 1: Metrics form last nights build on my "tiny" RTX 2060 (6 GB VRAM)
Image 2: Metrics from my most recent build on my RTX 4070 Laptop (8GB VRAM)

The majority of my time went into the shared memory, but it's stable and I'm very excited!
Here some debug logs, a la "trust me bro"

----
Currently available: 1112735744, attempting to reclaim: 1073741824
--- VRAM STATE [backward pass] ---
Driver Used:    6744 MB / 7805 MB
Data on GPU:    1641 MB
Grads on GPU:   3459 MB
CPU Offloaded: 18230 MB
---------------------------------
Currently available: 1079181312, attempting to reclaim: 1073741824
--- VRAM STATE [backward pass] ---
Driver Used:    6776 MB / 7805 MB
Data on GPU:    1561 MB
Grads on GPU:   3279 MB
CPU Offloaded: 18590 MB
-----------------------------

Final models get exported in safetensors format and are compatible with PyTorch and transformers, for accessibility.

- [^1]: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#unified-memory

1 reply

prithivMLmods

posted an update about 18 hours ago

Post

1113

Now, a collection of various compression schemes for Gemma 4 and the abliterated version 1 of dense models is available on the Hub. Check it out via the links below. 👇

🔗Gemma 4 Compression(s)- https://huggingface.co/collections/prithivMLmods/gemma-4-compressions
🔗Gemma 4 Uncensored [MAX] + Compression(s) - [`β ]- https://huggingface.co/collections/prithivMLmods/gemma-4-uncensored-max-compressions
🔗Gemma 4 Compression(s) - MoE- https://huggingface.co/collections/prithivMLmods/gemma-4-compressions-moe
🔗Gemma-4 F32 GGUF- https://huggingface.co/collections/prithivMLmods/gemma-4-f32-gguf

🤗 > To learn more, visit the app page or the respective model pages.

anakin87

posted an update 1 day ago

Post

2185

🌀 Let LLMs wander - Engineering RL Environments

Reinforcement Learning Environments are little worlds
where models can act, get rewards, and learn.

I've been exploring how to design them, figuring out what works and what doesn't.

If you want to learn how to build them, I recorded a practical intro video.

You'll also see how to turn Liquid AI LFM2-2.6B into a Tic-tac-toe master 🙂

🎥 Engineering RL Environments video: https://www.youtube.com/watch?v=71V3fTaUp2Q

---

🌱 LLM RL Environments Lil Course: https://github.com/anakin87/llm-rl-environments-lil-course

🤗🕹️ Play against the trained model: anakin87/LFM2-2.6B-mr-tictactoe

📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

fffiloni

posted an update 2 days ago

Post

2704

✨ PASD Magnify is back on Hugging Face Spaces

fffiloni/PASD

PASD isn’t recent, but still delivers strong results — worth restoring rather than replacing.

Getting it to run again wasn’t a simple dependency issue.
It relied on parts of diffusers that no longer exist, while moving to Gradio 6 forced a much newer HF stack — and I couldn’t modify the original source directly.

Recreating the old environment wasn’t practical.
So I patched the downloaded code at runtime before import and made it compatible with today’s stack.

That ended up being the only approach that held without forking or freezing everything to outdated versions.

If you’ve used it before (or are curious), feel free to give it another try.

kanaria007

posted an update 1 day ago

Post

116

✅ Article highlight: *Rights Under Lightspeed* (art-60-061, v0.1)

TL;DR:
This article reframes “AI rights” as a *runtime governance problem*, not a metaphysical debate.

In a slow-light universe, centralized approval can become physically impossible. When latency and partitions block round-trip control, some node must be predelegated bounded local discretion. In SI terms, those “rights” are *bounded autonomy envelopes*: explicit effect permissions with scope, gates, budgets, auditability, and rollback.

Read:
kanaria007/agi-structural-intelligence-protocols

Why it matters:
• moves the AI-rights discussion from sentiment to system design
• explains why physics can force local autonomy under high RTT or partitions
• treats rights and governance as duals: *discretion on one side, proof/rollback on the other*
• gives a practical ladder from proposal-only systems to governed autonomous SI nodes

What’s inside:
• “rights” as *operational rights / discretion budgets*
• mapping from rights tiers to *SI-Core conformance + RML maturity*
• deep-space latency as the clearest stress case
• *autonomy envelopes* as typed, scoped, rate-limited, auditable permission objects
• a migration path from *LLM wrappers* to governed autonomous nodes

Key idea:
In distributed worlds, “AI rights” stop being a moral trophy question and become an engineering question:

*What discretion must a node hold to do its job under physics, and what governance makes that safe?*

ArtelTaleb

posted an update 2 days ago

Post

1573

Hello 3D Comics & Tone !

Turn any 3D model into vintage comic art, right in your browser

No Photoshop. No plugins. No server.
Just your model image or 3D ( glb, obj... ) a canvas, and four styles that hit different:

Halftone - offset dots, newsprint feel, classic retro
Comic - cel-shading, ink outlines, misregistration grain
Kraft - raw paper, zine energy, underground press vibes
Anaglyph - red/cyan shift, retro sci-fi, put your glasses on

Drop a GLB or OBJ. Orbit around it. Watch the filter breathe on the geometry in real time. Dial in dot size, paper color, ink intensity, contrast then export as PNG,
GIF 360°, sprite sheet or WebM video.

---

I built this for the people who love 3D but miss the warmth of print.

For designers who grew up on comics.
For artists who think PBR is overrated.
For anyone who ever asked "what if my Blender model looked like it came from 1967?"

It's free. It's instant. It runs entirely in your browser.

ArtelTaleb/3d-comics-tone

danielhanchen

posted an update 5 days ago

Post

5037

You can now fine-tune Gemma 4 for free with our notebooks! 🔥

You just need 8GB VRAM to train Gemma 4 locally!

Unsloth trains Gemma4 1.5x faster with 50% less VRAM.
GitHub: https://github.com/unslothai/unsloth
Guide + Notebooks: https://unsloth.ai/docs/models/gemma-4/train

5 replies

JonnaMat

posted an update 3 days ago

Post

1184

⚡ FlashHead: Fast LM Head Inference - Now a Simple vLLM Plugin

flash-head replaces the dense LM head with a two-stage retrieval pipeline - up to 2x inference speedup, training-free. Previously required custom Docker images; now it's just:

pip install flash-head                                                                                                              
vllm serve embedl/Qwen3-1.7B-FlashHead-W4A16

✨ The plugin activates automatically via vLLM's vllm.general_plugins entry point. No source patches, no custom imports.

🧩 Supported models (full collection):

Qwen Qwen3,

meta-llama Llama3,

google Gemma3,

nvidia Cosmos-Reason2 - BF16 and W4A16 variants.
https://huggingface.co/collections/embedl/flashhead

📊 embedl/Edge-Inference-Benchmarks

🔧 Benchmark it yourself:

vllm bench latency --model embedl/Qwen3-1.7B-FlashHead-W4A16 --batch-size 1

# Baseline comparison                     
FLASHHEAD_ENABLED=0 vllm bench latency --model embedl/Qwen3-1.7B-FlashHead-W4A16 --batch-size 1

FlashHead shines at low batch sizes; the typical real-time / on-device use case. 🚀

2 replies

Recently active users