Blog Posts

Linux 7.2 Seeds "Blackwell-Next": A Deep Dive into the nvgrace-gpu VFIO CXL DVSEC Change
Linux 7.2’s VFIO pull request dropped a commit with a codename I hadn’t seen before: Blackwell-Next. A Phoronix post brought this to my attention - Linux 7.2 Begins Making Preparations For NVIDIA “Blackwell-Next” - which, on the face of it looks like a minor prep patch. It is — but it’s also a clean window into where NVIDIA is taking its CPU-coherent GPU stack, how CXL is quietly becoming the standard signaling interface for next-generation accelerators, and what that means if you’re building infrastructure or tooling on top of these platforms.
Read More
Linux Kernel v7.1 is Released: This is What's New for Compute Express Link (CXL)
The Linux Kernel v7.1 release brings several improvements and additions related to Compute Express Link (CXL) technology.
Release Highlights
Linux Kernel v7.1 includes 47 commits to the CXL and DAX subsystems:
| Category | Commits |
|---|---|
| New Features & Hardware | 1 |
| Bug Fixes | 5 |
| Refactoring & Cleanup | 5 |
| Testing | 1 |
| Other | 35 |
The v7.1 CXL/DAX cycle is defined by three interlocking themes: laying the groundwork for Type 2 accelerator support, hardening the DAX/HMEM subsystem against a cluster of correctness bugs, and a focused refactoring of the region layer that splits a monolithic file into purpose-specific translation units. None of these is a headline splash feature on its own, but together they represent the kind of steady, unglamorous investment that makes the subsystem reliable enough to build production systems on.
Read More
Graphify + MemMachine: 79× Token Reduction, Zero Vector Database
I help maintain MemMachine — an open-source long-term memory layer for AI agents. It’s a real codebase: 442 source files, 171 docs, a graph database, a SQL store, an MCP server, a REST API, a Python SDK, and integrations with eight different agent frameworks. When a new contributor asks “where does episodic memory actually get written?”, grep, the tool of choice for many AI coding assistants, doesn’t cut it. The answer threads through five files in three folders, plus a docker-compose service definition and a Helm chart. Each question you ask, it has to search all of these files, using the LLM to semantically understand the question and the files, then piece together an answer. This can take a lot of tokens and consume much of the context window.
Read More
Is Thinking Mode Affecting Your Agentic Workflows?
I jumped on the trend of running local LLMs and agents and was having a lot of fun until my agents kept failing, timing out, and just stopping without any obvious reason. I tried PaperClip + ZeroClaw, PaperClip + Hermes-Agent, and Hermes-Agent + Hermes-Workspace with Qwen 3.6 and Gemma 4 models (various sizes and quantization levels). All of them failed in the same way at some point in the workflow with almost nothing reported in the logs to indicate what was happening. Some tasks completed without any problem, but most did not, often leaving me to wonder what was going on. After many hours of debugging and reading many forums, I finally found that this was a model serving configuration trap that catches many people the first time they self-host a reasoning model.
Read More
How To Run ZeroClaw in Docker with local LLMs (Qwen3 on an NVIDIA DGX Spark)
ZeroClaw is an open-source agent runtime. By default it expects a frontier model API key such as Claude, OpenAI, etc. This guide shows how to use a local Qwen3.6 model served by vLLM on an NVIDIA DGX Spark, routed through LiteLLM, with ZeroClaw and Firecrawl running in Docker on a separate host.
It also documents the onboarding bug I hit on a fresh install in v0.7.4 — ZeroClaw issue #6123 — and the config-only workaround.
Read More
Run Free LLMs at Scale: LiteLLM Gateway with Groq, NVIDIA NIM, OpenRouter, and Local vLLM
Introduction
Running large language models is increasingly affordable — but “affordable” rarely means “free, all the time, for every request.” Cloud providers each come with their own rate limits, daily quotas, and occasional model deprecations. Local hardware is fast and private, but not always available (DGX Spark powered down, model being updated, VRAM needed elsewhere). Somewhere between “I have an API key” and “my agents work reliably at scale” is a configuration problem that most guides skip over entirely.
Read More
vLLM Recipe: RedHatAI/Qwen3.6-35B-A3B-NVFP4 on DGX Spark
This is a vLLM Recipe - a production-ready Docker Compose configuration for running open-weight models on local hardware. It documents the exact setup, configuration rationale, and benchmark results so you can get a model running quickly. You are welcome to change the parameters to suit your workloads. This worked for me, so I hope you find it helpful.
This recipe covers Qwen3.6-35B-A3B-NVFP4 - a Mixture-of-Experts model with 35B total parameters but only ~3B active at inference - quantized to NVFP4 by Red Hat AI and running on the NVIDIA DGX Spark (my GigaByte AI Top Atom) with a GB10 Blackwell GPU and 128 GB of unified CPU/GPU memory.
Read More
Self-Hosting Firecrawl on Ubuntu 25.04 with Docker Compose
Modern AI agents — Claude Code, Codex, OpenClaw, Hermes-Agent, and custom LangChain pipelines — need a way to read the web. Not raw HTML full of navigation debris, cookie banners, and JavaScript noise, but clean structured text that a language model can actually reason about. Firecrawl is the missing piece: an open-source web scraping and crawling API that fetches any URL and returns clean Markdown, ready to drop straight into a context window or a RAG pipeline.
Read MoreCategories
- 3D Printing ( 7 )
- AI ( 11 )
- Books ( 2 )
- Cloud Computing ( 1 )
- Conferences ( 2 )
- CXL ( 24 )
- Data Center ( 2 )
- Development ( 2 )
- Events ( 2 )
- Hardware ( 1 )
- How To ( 35 )
- HowTo ( 1 )
- Linux ( 32 )
- Machine Learning ( 1 )
- OrcaSlicer ( 2 )
- Performance ( 2 )
- Persistent Memory ( 1 )
- PMEM ( 1 )
- Product Manager ( 1 )
- Projects ( 3 )
- Servers ( 1 )
- Storage ( 1 )
- System Administration ( 2 )
- Troubleshooting ( 4 )
- Ubuntu ( 1 )
- Vector Databases ( 1 )
Tags
- 3D Printing
- 3MF
- ACPI
- ACPI-CA
- Acpidump
- Active-Memory
- Agent
- Agent Runtime
- Agent Skills
- Agent Teams
- AI
- AI Agents
- AI Engineering
- AI Infrastructure
- AMD
- API
- Apple Silicon
- Arcade
- Artificial Intelligence
- AST Extraction
- AutoGen
- AWS EC2
- Bash
- Benchmark
- Blackwell
- Blister Pack
- Book
- Boot
- Bootable-Usb
- Build From Source
- Buyer's Guide
- C
- C-2
- Chat Completions
- Chat GPT
- ChatGPT
- Claude Code
- Clflushopt
- Cloud
- CMake
- Code Tunnel
- Code-Server
- Codespaces
- Codex
- Compute Express Link
- Cpu
- Crawling
- CrewAI
- Custom GPT
- Custom-Kernel
- CXL
- CXL 1.0
- CXL 1.1
- CXL 2.0
- CXL 3.0
- CXL Devices
- CXL Specification
- Data Center
- DAX
- Daxctl
- Debugging
- DeepSeek-R1
- Dell
- Development
- Device-Mapper
- DGX Spark
- Dm-Writecache
- Docker
- Docker Compose
- DRAM
- Edge
- Enfabrica
- Esxi
- Fastfetch
- Featured
- Fedora
- Firecrawl
- Firmware
- Free AI Models
- Free LLM API
- Frequency
- FSDAX
- G-Code
- GB10
- Gemma3
- Generative Prompt Engineering
- Git
- GLM-4.7
- Governor
- Gpg
- GPT
- Gpt-3
- Gpt-4
- GPU
- Grafana
- Graph Database
- Graphify
- GraphRAG
- Groq
- H3 Platform
- Hermes-Agent
- Home Lab
- HPE
- Iasl
- Intel
- Ipmctl
- Java
- Kernel
- Knowledge Graph
- Kvm
- LangChain
- LangGraph
- Lenovo
- Linux
- Linux Kernel
- Linux-Volume-Manager
- LiteLLM
- Llama.cpp
- LLM
- LLM Fallback
- LLM Gateway
- Local LLM
- Lvm
- Machine Learning
- MacOS
- Mainline
- MAME
- Max_tokens
- MCP
- MCP Server
- MemMachine
- Memory
- Memory Management
- Memory Mapping
- Memory-Tiering
- Micron
- Microsoft
- ML
- Mmap
- Model Serving
- MoE
- Movdir64b
- MTP
- Mysql
- Napkin Math
- NDCTL
- Neo4j
- Neofetch
- NIM
- NUMA
- Nvdimm
- NVFP4
- NVIDIA
- NVIDIA Builder
- NVIDIA Developer Program
- NVIDIA NIM
- Ollama
- Open Source
- Open Source Maintenance
- Open WebUI
- OpenAI-Compatible
- OpenAI-Compatible API
- OpenClaw
- OpenRouter
- OpenWebUI
- Optane
- OrcaSlicer
- Pagemap
- PCIe
- Percona
- Performance
- Performance Tuning
- Persistent Memory
- Personal Branding
- Physical Address
- Physical Memory
- Pmdk
- PMem
- Powersave
- Procfs
- Product Manager
- Programming
- Prometheus
- Prompt Engineering
- Python
- Qdrant
- QEMU
- Qwen3
- Qwen3.6
- RAG
- Rate Limiting
- Reasoning Models
- RedHatAI
- Remote Development
- Retimers
- Retrieval Augmented Generation
- Rust
- Samsung
- Self-Hosting
- Server
- Servers
- SGLang
- SNC
- Spec-Driven Development
- Speculative Decoding
- SSH
- STREAM Benchmark
- Sub-NUMA Cluster
- Sub-NUMA Clustering
- Subagents
- Supermicro
- Switches
- Sysadmin
- Sysfs
- System Administration
- System Information
- System-Ram
- Technical Documentation
- Terminal
- Thinking Mode
- Tiered-Memory
- Token Reduction
- Travel Moves
- Tree-Sitter
- Tutorial
- Ubuntu
- Ubuntu 22.04
- Ubuntu 25.04
- Uv
- Vector Databases
- VFIO
- Virtual Memory
- Virtualization
- VLLM
- Vmware
- Vmware-Esxi
- Vpmem
- VS Code
- Vsphere
- Web Scraping
- Website
- Window
- Windows
- Windows-Server
- Working-Set-Size
- Wss
- Xcode
- ZeroClaw