Blog Posts

vLLM Recipe: RedHatAI/Qwen3.6-35B-A3B-NVFP4 on DGX Spark

vLLM Recipe: RedHatAI/Qwen3.6-35B-A3B-NVFP4 on DGX Spark

This is a vLLM Recipe - a production-ready Docker Compose configuration for running open-weight models on local hardware. It documents the exact setup, configuration rationale, and benchmark results so you can get a model running quickly. You are welcome to change the parameters to suit your workloads. This worked for me, so I hope you find it helpful.

This recipe covers Qwen3.6-35B-A3B-NVFP4 - a Mixture-of-Experts model with 35B total parameters but only ~3B active at inference - quantized to NVFP4 by Red Hat AI and running on the NVIDIA DGX Spark (my GigaByte AI Top Atom) with a GB10 Blackwell GPU and 128 GB of unified CPU/GPU memory.

Read More
Self-Hosting Firecrawl on Ubuntu 25.04 with Docker Compose

Self-Hosting Firecrawl on Ubuntu 25.04 with Docker Compose

Modern AI agents — Claude Code, Codex, OpenClaw, Hermes-Agent, and custom LangChain pipelines — need a way to read the web. Not raw HTML full of navigation debris, cookie banners, and JavaScript noise, but clean structured text that a language model can actually reason about. Firecrawl is the missing piece: an open-source web scraping and crawling API that fetches any URL and returns clean Markdown, ready to drop straight into a context window or a RAG pipeline.

Read More

Building an Agentic Team for an Open Source Project with Claude Code

A core engineer on MemMachine — the one who owned the Semantic Memory subsystem — left the project. The codebase didn’t grow any less complex overnight, but the human attention available to maintain it did. That’s a familiar shape of problem in any open source project, and it’s the exact shape where a well-designed Claude Code agent team earns its keep.

This post documents what I built: a 22-agent maintenance team that lives entirely inside MemMachine’s repository, coordinates via Claude Code’s experimental Agent Teams runtime, and operates under a design I can reproduce for any existing repository with real code. The agents don’t push code, don’t sign commits, don’t merge pull requests, and don’t cut releases — humans still gatekeep every consequential action. What the agents do do is the tedious and error-prone middle of software maintenance: triage, spec drafting, implementation, QA, security review, docs, dependency and upstream tracking.

Read More
Using the API to Find Free Hosted Models on NVIDIA Builder

Using the API to Find Free Hosted Models on NVIDIA Builder

The NVIDIA Developer Program provides access to a wide catalog of AI models through NVIDIA Inference Microservices (NIM), offering an OpenAI-compatible API. You can browse and discover available models at build.nvidia.com/explore/discover .

If you want to find models with free hosted endpoints in the browser, you can enable the “Free Endpoint” filter on the model catalog page. But what if you need that information programmatically – in a script, a CI pipeline, or as part of an automated workflow? The browser filter is not accessible through the API, and the /v1/models endpoint does not distinguish between free hosted models and everything else.

Read More
Design for Developers: Features, Flags, and the CLI

Design for Developers: Features, Flags, and the CLI

Series: Building lib3mf-rs

This post is part of a 5-part series on building a comprehensive 3MF library in Rust:

  1. Part 1: My Journey Building a 3MF Native Rust Library from Scratch
  2. Part 2: The Library Landscape - Why Build Another One?
  3. Part 3: Into the 3MF Specification Wilderness - Reading 1000+ Pages of Specifications
  4. Part 4: Design for Developers - Features, Flags, and the CLI
  5. Part 5: Reflections and What’s Next - Lessons from Building lib3mf-rs

Understanding the specifications was one thing. Designing a library that developers would actually want to use? That was another challenge entirely. I’ve worked on many libraries over the years, and I’ve learned a lot about what makes a good library, for example the Persistent Memory Development Kit . The difference is understanding how Rust does things.

Read More
Into the 3MF Specification Wilderness: Reading 1000+ Pages of Specifications

Into the 3MF Specification Wilderness: Reading 1000+ Pages of Specifications

Series: Building lib3mf-rs

This post is part of a 5-part series on building a comprehensive 3MF library in Rust:

  1. Part 1: My Journey Building a 3MF Native Rust Library from Scratch
  2. Part 2: The Library Landscape - Why Build Another One?
  3. Part 3: Into the 3MF Specification Wilderness - Reading 1000+ Pages of Specifications
  4. Part 4: Design for Developers - Features, Flags, and the CLI
  5. Part 5: Reflections and What’s Next - Lessons from Building lib3mf-rs

“How hard can it be? It’s just a file format.”

That’s what I thought before I started reading the 3MF specifications. After reading, re-reading, and getting AI to help summarize and dig deeper into the interpretations and understandings, I was ready to begin.

Read More
The Library Landscape: Why Build Another One?

The Library Landscape: Why Build Another One?

Series: Building lib3mf-rs

This post is part of a 5-part series on building a comprehensive 3MF library in Rust:

  1. Part 1: My Journey Building a 3MF Native Rust Library from Scratch
  2. Part 2: The Library Landscape - Why Build Another One?
  3. Part 3: Into the 3MF Specification Wilderness - Reading 1000+ Pages of Specifications
  4. Part 4: Design for Developers - Features, Flags, and the CLI
  5. Part 5: Reflections and What’s Next - Lessons from Building lib3mf-rs

“Why not just use the existing library?”

It’s a fair question. One I asked myself many times during the early days of this project. The 3MF Consortium maintains lib3MF , a comprehensive C++ implementation used by major companies in additive manufacturing. Why build another one?

Read More
Tags