Steve Scargall

Steve Scargall

Steve Scargall is a Senior Product Manager and Software Architect at MemVerge , delivering software-defined memory solutions using Compute Express Link (CXL) devices. Steve works with industry leaders in the CXL hardware vendor, Original Equipment Manufacturers (OEMs), Cloud Service Providers (CSPs), Enterprise, and System Integrator spaces to architect cutting-edge solutions. Steve holds a bachelor’s degree in BSc Applied Computer Science and Cybernetics from the University of Reading, UK. He has made significant contributions to the SNIA NVM Programming Technical Work Group, PMDK, NDCTL, IPMCTL, and other memory-centric open-source projects. Steve is the author of Programming Persistent Memory: A Comprehensive Guide for Developers .

Using the API to Find Free Hosted Models on NVIDIA Builder

Using the API to Find Free Hosted Models on NVIDIA Builder

The NVIDIA Developer Program provides access to a wide catalog of AI models through NVIDIA Inference Microservices (NIM), offering an OpenAI-compatible API. You can browse and discover available models at build.nvidia.com/explore/discover .

If you want to find models with free hosted endpoints in the browser, you can enable the “Free Endpoint” filter on the model catalog page. But what if you need that information programmatically – in a script, a CI pipeline, or as part of an automated workflow? The browser filter is not accessible through the API, and the /v1/models endpoint does not distinguish between free hosted models and everything else.

Read More
Design for Developers: Features, Flags, and the CLI

Design for Developers: Features, Flags, and the CLI

Series: Building lib3mf-rs

This post is part of a 5-part series on building a comprehensive 3MF library in Rust:

  1. Part 1: My Journey Building a 3MF Native Rust Library from Scratch
  2. Part 2: The Library Landscape - Why Build Another One?
  3. Part 3: Into the 3MF Specification Wilderness - Reading 1000+ Pages of Specifications
  4. Part 4: Design for Developers - Features, Flags, and the CLI
  5. Part 5: Reflections and What’s Next - Lessons from Building lib3mf-rs

Understanding the specifications was one thing. Designing a library that developers would actually want to use? That was another challenge entirely. I’ve worked on many libraries over the years, and I’ve learned a lot about what makes a good library, for example the Persistent Memory Development Kit . The difference is understanding how Rust does things.

Read More
Into the 3MF Specification Wilderness: Reading 1000+ Pages of Specifications

Into the 3MF Specification Wilderness: Reading 1000+ Pages of Specifications

Series: Building lib3mf-rs

This post is part of a 5-part series on building a comprehensive 3MF library in Rust:

  1. Part 1: My Journey Building a 3MF Native Rust Library from Scratch
  2. Part 2: The Library Landscape - Why Build Another One?
  3. Part 3: Into the 3MF Specification Wilderness - Reading 1000+ Pages of Specifications
  4. Part 4: Design for Developers - Features, Flags, and the CLI
  5. Part 5: Reflections and What’s Next - Lessons from Building lib3mf-rs

“How hard can it be? It’s just a file format.”

That’s what I thought before I started reading the 3MF specifications. After reading, re-reading, and getting AI to help summarize and dig deeper into the interpretations and understandings, I was ready to begin.

Read More
The Library Landscape: Why Build Another One?

The Library Landscape: Why Build Another One?

Series: Building lib3mf-rs

This post is part of a 5-part series on building a comprehensive 3MF library in Rust:

  1. Part 1: My Journey Building a 3MF Native Rust Library from Scratch
  2. Part 2: The Library Landscape - Why Build Another One?
  3. Part 3: Into the 3MF Specification Wilderness - Reading 1000+ Pages of Specifications
  4. Part 4: Design for Developers - Features, Flags, and the CLI
  5. Part 5: Reflections and What’s Next - Lessons from Building lib3mf-rs

“Why not just use the existing library?”

It’s a fair question. One I asked myself many times during the early days of this project. The 3MF Consortium maintains lib3MF , a comprehensive C++ implementation used by major companies in additive manufacturing. Why build another one?

Read More
My Journey Building a 3MF Native Rust Library from Scratch

My Journey Building a 3MF Native Rust Library from Scratch

For the past few years, I’ve been getting more and more into 3D printing as a hobbyist. Like everyone, I started with one, a Bambu Lab X1 Carbon, which has now grown to three printers. I find the hobby fascinating as it entangles software, firmware, hardware, physics, and materials science.

As a software engineer, I’m naturally drawn to the software side of things (Slicer and Firmware). But what interests me most, is how the software interacts with the hardware and the materials. How the slicer translates the 3D model into instructions for the printer (G-Code). How the printer executes those instructions. How the materials behave under the printer’s control.

Read More
I Added a Feature to OrcaSlicer to Show Travel Distance and Moves

I Added a Feature to OrcaSlicer to Show Travel Distance and Moves

OrcaSlicer is a powerful and popular slicer for 3D printers, known for its rich feature set and active development community. In this blog post, we’ll take a closer look at a new feature I proposed and implemented that provides more insight into your prints: the display of total travel distance and the number of travel moves. See the feat: Display travel distance and move count in G-code summary for more details.

Read More
How to Build OrcaSlicer from Source on macOS 15 Sequoia - A Step-by-Step Guide

How to Build OrcaSlicer from Source on macOS 15 Sequoia - A Step-by-Step Guide

Building OrcaSlicer from source on macOS 15 (15.6.1 Sequoia) can be straightforward, but recent changes in macOS, Xcode, and CMake require some extra care. This guide updates the official instructions with important tips and fixes from this GitHub issue to avoid common build issues.

For this article, we will be using this build system:

Prerequisites

Before you start, ensure you have the following installed:

Read More
How to Build acpidump from Source and use it to Debug Complex CXL and PCI Issues

How to Build acpidump from Source and use it to Debug Complex CXL and PCI Issues

This article is a detailed guide on how to build the latest version of the acpidump tool from its source code. While many Linux distributions, like Ubuntu, offer a packaged version of this utility, it’s often outdated. For developers and enthusiasts working with modern hardware features, particularly those related to Compute Express Link (CXL), having the most current version is essential.

Before you begin, it’s important to remove any old, conflicting versions of the tools. If you have previously installed the acpica-tools package from your distribution’s repository, you should remove it to prevent conflicts.

Read More
Is Your Application Really Using Persistent Memory? Here’s How to Tell.

Is Your Application Really Using Persistent Memory? Here’s How to Tell.

Persistent memory (PMEM), especially when accessed via technologies like CXL, promises the best of both worlds: DRAM-like speed with the durability of an SSD. When you set up a filesystem like XFS or EXT4 in FSDAX (File System Direct Access) mode on a PMEM device, you’re paving a superhighway for your applications, allowing them to map files directly into their address space and bypass the kernel’s page cache entirely.

But here’s the crucial question: after all the setup and configuration, how do you prove that your application’s data is physically residing on the PMEM device and not just in regular RAM? I’ve run into this question myself, so I wrote a small Python script to get a definitive answer using SQLite3 as an example application. However, before we proceed with the script, let’s examine how you can verify this manually.

Read More
How to Confirm Virtual to Physical Memory Mappings for PMem and FSDAX Files

How to Confirm Virtual to Physical Memory Mappings for PMem and FSDAX Files

Are you curious whether your application’s memory-mapped files are really using Intel Optane Persistent Memory (PMem), Compute Express Link (CXL) Non-Volatile Memory Modules (NV-CMM), or another DAX-enabled persistent memory device? Want to understand how virtual memory maps onto physical, non-volatile regions? Let’s use easily adaptable scripts in both Python and C to confirm this on your Linux system, definitively.

Why Does This Matter?

With the advent of persistent memory and DAX (Direct Access) filesystems, applications can memory-map files directly onto PMem, bypassing the traditional DRAM page cache. This promises significant performance and durability improvements for data-intensive workloads and databases, such as SQLite, Redis, and others.

Read More
CXL Memory NUMA Node Mapping with Sub-NUMA Clustering (SNC) on Linux

CXL Memory NUMA Node Mapping with Sub-NUMA Clustering (SNC) on Linux

CXL (Compute Express Link) memory devices are revolutionizing server architectures, but they also introduce new NUMA complexity, especially when advanced memory configurations, such as Sub-NUMA Clustering (SNC), are enabled. One of the most confusing issues is the mismatch between NUMA node numbers reported by CXL sysfs attributes and those used by Linux memory management tools.

This blog post walks through a real-world scenario, complete with command outputs and diagrams, to help you understand and resolve the CXL NUMA node mapping issue with SNC enabled.

Read More
CXL Device & Fabric Buyer's Guide: A List of GA Components (2025)

CXL Device & Fabric Buyer's Guide: A List of GA Components (2025)

Last Updated: June 27, 2025

This guide provides a curated list of generally available (GA) Compute Express Link (CXL) devices, fabric components, and memory appliances. It is a technical resource for engineers, architects, and hardware specialists looking to identify and compare CXL memory expansion modules, switches, and full system-level appliances from leading vendors. The tables below detail market-ready components, focusing on the specifications required to design and build CXL-enabled infrastructure.

Read More
CXL Server Buyer's Guide: A Complete List of GA Platforms (Updated 2025)

CXL Server Buyer's Guide: A Complete List of GA Platforms (Updated 2025)

Last Updated: June 27, 2025

This quick reference guide provides a definitive, up-to-date list of generally available (GA) Compute Express Link (CXL) servers from major OEMs like Dell, HPE, Lenovo, and Supermicro. It is designed for data center architects, engineers, and IT decision-makers who need to identify and compare server platforms that support CXL 1.1 and CXL 2.0 for memory expansion and pooling. The tables below offer a direct comparison of server models, supported CPUs, CXL versions, and compatible CXL device form factors. The goal is to cut through the noise of announcements and roadmaps to provide a clear view of what you can deploy today.

Read More
Your Personal Codespace: Self-Host VS Code on Any Server

Your Personal Codespace: Self-Host VS Code on Any Server

GitHub Codespaces and other cloud IDEs have revolutionized development, offering a complete VS Code environment that runs on a remote server and is accessible from any browser. It’s a game-changer for productivity and flexibility.

But what if you could have that same powerful, seamless experience on your own terms?

This guide will show you how to build your very own private Codespace, replicating the convenience of the GitHub experience on any server you control—be it a machine in your home lab, a dedicated server, or a budget-friendly cloud VM. We’ll explore two distinct paths to get you up and running with a persistent, browser-based VS Code instance on Ubuntu 24.04, complete with AI assistants like Gemini and GitHub Copilot to boost your workflow.

Read More
Unlock Your CXL Memory: How to Switch from NUMA (System-RAM) to Direct Access (DAX) Mode

Unlock Your CXL Memory: How to Switch from NUMA (System-RAM) to Direct Access (DAX) Mode

As a Linux System Administrator working with Compute Express Link (CXL) memory devices, you should be aware that as of Linux Kernel 6.3, Type 3 CXL.mem devices are now automatically brought online as memory-only NUMA nodes. While this can be beneficial for most situations, it might not be ideal if your application is designed to directly manage the CXL memory as a DAX (Direct Access) device using mmap().

This blog post will explain this behavior and provide a step-by-step guide on how to convert a CXL memory device from a memory-only NUMA node back to DAX mode, allowing applications to mmap the underlying /dev/daxX.Y device. We’ll also cover troubleshooting steps if the memory is actively in use by the kernel or other processes.

Read More
Fastfetch: The Speedy Successor Neofetch Replacement Your Ubuntu Terminal Needs

Fastfetch: The Speedy Successor Neofetch Replacement Your Ubuntu Terminal Needs

If you love customizing your Linux terminal and getting a quick, visually appealing overview of your system specs, you might have used neofetch in the past. However, neofetch is now deprecated and no longer actively maintained. A fantastic, actively maintained alternative is Fastfetch – known for its speed, extensive customization options, and feature set.

While you might be able to install Fastfetch on Ubuntu 22.04 (Jammy Jellyfish) using the standard sudo apt install fastfetch, the version available in the default Ubuntu repositories is often outdated. To get the latest features, bug fixes, and performance improvements, you’ll want to use a different method.

Read More
How I Created a Custom ChatGPT Trained on the CXL Specification Documents

How I Created a Custom ChatGPT Trained on the CXL Specification Documents

If you’re working with Compute Express Link (CXL) and wish you had an AI assistant trained on all the different versions of the specification—1.0, 1.1, 2.0, 3.0, 3.1… you’re in luck.

Whether you’re a CXL device vendor, a firmware engineer, a Linux Kernel developer, a memory subsystem architect, a hardware validation engineer, or even an application developer working on CXL tools and utilities, chances are you’ve had to reference the CXL spec at some point. And if you have, you already know: these documents are dense, extremely technical, and constantly evolving.

Read More
I Turned Myself Into an Action Figure

I Turned Myself Into an Action Figure

Part of being in tech, especially in emerging memory technology, is constantly switching between the serious and the surreal. One day you’re in kernel debug mode, the next you’re explaining complex system architectures on a whiteboard, and then suddenly you’re jumping on the latest craze such as making yourself into an action figure.

It’s fun. It’s human. And honestly? It’s a reminder not to take yourself too seriously. (Even if your job title suggests differently)

Read More
Linux Kernel 6.14 is Released: This is What's New for Compute Express Link (CXL)

Linux Kernel 6.14 is Released: This is What's New for Compute Express Link (CXL)

The Linux Kernel 6.14 release brings several improvements and additions related to Compute Express Link (CXL) technology.

Here is the detailed list of all commits merged into the 6.14 Kernel for CXL and DAX. This list was generated by the Linux Kernel CXL Feature Tracker .

Read More
Building NDCTL Utilities from Source: A Comprehensive Guide

Building NDCTL Utilities from Source: A Comprehensive Guide

Building NDCTL with Meson on Ubuntu 24.04

The NDCTL package includes the cxl, daxctl, and ndctl utilities. It uses the Meson build system for streamlined compilation. This guide reflects the modern build process for managing NVDIMMs, CXL, and PMEM on Ubuntu 24.04.

If you do not install a more recent Kernel than the one provided by the distro, then it is not recommended to compile these utilities from source code. If you have installed a mainline Kernel, then you will likely require a newer version of these utilities that are compatible with your Kernel. See the NDCTL Releases as the Kernel support information is provided there.

Read More
Linux Kernel 6.13 is Released: This is What's New for Compute Express Link (CXL)

Linux Kernel 6.13 is Released: This is What's New for Compute Express Link (CXL)

The Linux Kernel 6.13 release brings several improvements and additions related to Compute Express Link (CXL) technology.

Here is the detailed list of all commits merged into the 6.13 Kernel for CXL and DAX. This list was generated by the Linux Kernel CXL Feature Tracker .

CXL related changes from Kernel v6.12 to v6.13:

Read More
Understanding STREAM: Benchmarking Memory Bandwidth for DRAM and CXL

Understanding STREAM: Benchmarking Memory Bandwidth for DRAM and CXL

In today’s Artificial Intelligence (AI), Machine Learning (ML), and high-performance computing (HPC) landscape, memory bandwidth is a critical factor in determining overall system performance. As workloads grow increasingly data-intensive, traditional DRAM-only setups are often insufficient, prompting the rise of new memory expansion technologies like Compute Express Link (CXL). To evaluate memory bandwidth across DRAM and CXL devices, we use a modified industry-standard tool called STREAM.

In this blog, we’ll explore what STREAM is, how it works, why it’s commonly used for benchmarking memory bandwidth, and how a modified version of STREAM can be used to measure performance in heterogeneous memory environments, including DRAM and CXL.

Read More
A Step-by-Step Guide on Using Cloud Images with QEMU 9 on Ubuntu 24.04

A Step-by-Step Guide on Using Cloud Images with QEMU 9 on Ubuntu 24.04

Introduction

Cloud images are pre-configured, optimized templates of operating systems designed specifically for cloud and virtualized environments. Cloud images are essentially vanilla operating system installations, such as Ubuntu, with the addition of the cloud-init package. This package enables run-time configuration of the OS through user data, such as text files on an ISO filesystem or cloud provider metadata. Using cloud images significantly reduces the time and effort required to set up a new virtual machine. Unlike ISO images, which require a full installation process, cloud images boot up immediately with the OS pre-installed

Read More
Remote Development Using VS Code and SSH with AWS EC2

Remote Development Using VS Code and SSH with AWS EC2

How to Perform Remote Code Development Using VS Code on a Remote AWS EC2 Instance via SSH

Remote development has become a crucial tool for developers, enabling the convenience of coding and deploying directly to remote environments. In this blog, I’ll walk you through the process of setting up Visual Studio Code (VS Code) to develop remotely on an AWS EC2 instance using SSH. By the end of this guide, you’ll be able to connect to your EC2 instance from VS Code and develop code as if you were working locally.

Read More
How Much RAM Could a Vector Database Use If a Vector Database Could Use RAM

How Much RAM Could a Vector Database Use If a Vector Database Could Use RAM

Featured image generated by ChatGPT 4o model: “a low poly woodchuck by a serene lake, surrounded by mountains and a forest with tree leaves made from DDR memory modules. The woodchuck is munching on a memory DIMM. The only memory DIMM in the image should be the one being eaten.”

How Much RAM Could a Vector Database Use If a Vector Database Could Use RAM?

Although the title is a punn from the famous “woodchuck rhyme,” the question is serious for LLM applications using vector databases. As large language models (LLMs) continue to evolve, leveraging vector databases to store and search embeddings is critical. Understanding the memory usage of these systems is essential for maintaining performance, response times, and ensuring system scalability.

Read More
Understanding Memory Usage with `smem`

Understanding Memory Usage with `smem`

Memory management is crucial for Linux administrators and developers, especially when optimizing performance for resource-intensive applications. While tools like top and htop are commonly used to monitor system performance, they often don’t provide enough detail regarding memory usage breakdown. This is where smem comes into play.

What is smem?

smem is a command-line tool that reports memory usage per process and provides better insight into shared memory than most traditional tools, taking shared memory pages into account. Unlike top or htop, which primarily display RSS (Resident Set Size), smem can also show USS (Unique Set Size), which is a better metric for understanding how much memory would be freed if a particular process were terminated. This blog will guide you through using smem, explaining these critical memory metrics and providing comparisons to more familiar tools.

Read More
Benchmarking GPUs: Measuring Throughput between CPU and GPU

Benchmarking GPUs: Measuring Throughput between CPU and GPU

This article was inspired by a LinkedIn post by Dennis Kennetz . The CPU to GPU bandwidth check is available on GitHub which uses a specific flow to assess the data transfer rates. Like many in the industry, my focus is on AI and ML workloads and how we can improve efficiencies and performance using DRAM, CXL, CPU, GPUs, and software improvements.

In the rapidly evolving landscape of artificial intelligence (AI) and machine learning (ML), the ability to process vast amounts of data efficiently is paramount. As AI models grow in complexity and size, the demand for high-performance computing resources intensifies. At the heart of this demand lies the crucial task of optimizing data transfers between various components of a computing system, particularly from DRAM, CPU, and emerging technologies like CXL (Compute Express Link) to and from the GPU.

Read More
50 Generative AI Prompts I use as a Product Manager to Improve Efficiency and Product Quality

50 Generative AI Prompts I use as a Product Manager to Improve Efficiency and Product Quality

As a product manager, the success of our products depends on our ability to make informed, strategic decisions quickly and efficiently. In the fast-paced world of the IT industry, it’s crucial to stay ahead of trends, understand our customers deeply, and align our product development with our business goals. This is where generative AI and a well-crafted set of prompts have become invaluable tools in my daily workflow.

Why I Use These Prompts Daily

Every day, I’m faced with decisions that can significantly impact the trajectory of our product roadmap, customer satisfaction, and ultimately, our bottom line. The prompts I use are not just about generating ideas but about driving actionable insights that are tailored to our specific needs. These prompts cover a wide range of product management aspects—from prioritizing features based on customer feedback to crafting a go-to-market strategy that aligns with our business objectives.

Read More
Linux Kernel 6.10 is Released: This is What's New for Compute Express Link (CXL)

Linux Kernel 6.10 is Released: This is What's New for Compute Express Link (CXL)

The Linux Kernel 6.10 release brings several improvements and additions related to Compute Express Link (CXL) technology.

Here is the detailed list of all commits merged into the 6.10 Kernel for CXL and DAX. This list was generated by the Linux Kernel CXL Feature Tracker .

Read More
Linux Kernel 6.9 is Released: This is What's New for Compute Express Link (CXL)

Linux Kernel 6.9 is Released: This is What's New for Compute Express Link (CXL)

The Linux Kernel 6.9 release brings several improvements and additions related to Compute Express Link (CXL) technology.

New Features

Here is a list of new features for CXL:

Here is the detailed list of all commits merged into the 6.9 Kernel for CXL and DAX. This list was generated by the Linux Kernel CXL Feature Tracker .

Read More
Linux Kernel CXL Feature Tracker

Linux Kernel CXL Feature Tracker

I’m always watching the Linux Kernel for new and exciting features that are merged for Compute Express Link (CXL). There’s some great notes from the monthly developer meetup here , but the devil is always in the details, and not every commit is discussed in the meeting. So I wrote a simple Python script, called cxl_feature_tracker.py that looks in all commits to the Linus Torvalds Linux Kernel GitHub repository , and extracts any that mention “CXL” or “DAX”, or that make changes to the drivers/cxl or drivers/dax directories. The output is a very long list, but it has some gems amongst the list of fixes.

Read More
Running Open WebUI and Ollama on Ubuntu 22.04 for a Local ChatGPT Experience

Running Open WebUI and Ollama on Ubuntu 22.04 for a Local ChatGPT Experience

Introduction

Open WebUI and Ollama are powerful tools that allow you to create a local chat experience using GPT models. Whether you’re experimenting with natural language understanding or building your own conversational AI, these tools provide a user-friendly interface for interacting with language models. In this guide, we’ll walk you through the installation process step by step.

Ollama is a cutting-edge platform designed to run open-source large language models locally on your machine. It simplifies the complexities involved in deploying and managing these models, making it an attractive choice for researchers, developers, and anyone who wants to experiment with language models1. Ollama provides a user-friendly interface for running large language models (LLMs) locally, specifically on MacOS and Linux (with Windows support on the horizon).

Read More
Using Linux Kernel Tiering with Compute Express Link (CXL) Memory

Using Linux Kernel Tiering with Compute Express Link (CXL) Memory

In this blog post, we will walk through the process of enabling the Linux Kernel Transparent Page Placement (TPP) feature with CXL memory mapped as NUMA nodes using the system-ram namespace. This feature allows the kernel to automatically place pages in different types of memory based on their usage patterns.

Prerequisites

This guide assumes that you are using a Fedora 36 system with Kernel 5.19.13, and that your system has a Samsung CXL device installed. You can confirm the presence of the CXL device with the following command:

Read More
Understanding Compute Express Link (CXL) and Its Alignment with the PCIe Specifications

Understanding Compute Express Link (CXL) and Its Alignment with the PCIe Specifications

How CXL Uses PCIe Electricals and Transport Layers

CXL utilizes the PCIe infrastructure, starting with the PCIe 5.0. This ensures compatibility with existing systems while introducing new features for device connectivity and memory coherency. CXL’s ability to maintain memory coherency across shared memory pools is a significant advancement, allowing for efficient resource sharing and operand movement between accelerators and target devices.

CXL builds upon the familiar foundation of PCIe, utilizing the same physical interfaces, transport layer, and electrical signaling. This shared foundation makes CXL integration with existing PCIe systems seamless. Here’s a breakdown of how it works:

Read More
A Practical Guide to Identify Compute Express Link (CXL) Devices in Your Server

A Practical Guide to Identify Compute Express Link (CXL) Devices in Your Server

In this article, we will provide four methods for identifying CXL devices in your server and how to determine which CPU socket and NUMA node each CXL device is connected. We will use CXL memory expansion (CXL.mem) devices for this article. The server was running Ubuntu 22.04.2 (Jammy Jellyfish) with Kernel 6.3 and ‘cxl-cli ’ version 75 built from source code. Many of the procedures will work on Kernel versions 5.16 or newer.

Read More

How To Install a Mainline Linux Kernel in Ubuntu

Note: This article was updated on Thursday, July 31st, 2025 and will work with newer Ubuntu releases.

By default, Ubuntu systems run with the Ubuntu kernels provided by the Ubuntu repositories. To get unmodified upstream kernels that have new features or to confirm that upstream has fixed a specific issue, we often need to install the mainline Kernel. The mainline kernel is the most recent version of the Linux kernel released by the Linux Kernel Organization. It undergoes several stages of development, including merge windows, release candidates, and final releases. Mainline kernels are designed to offer the latest features and improvements, making them attractive to developers and power users. Kernel.org lists the available Kernel versions.

Read More
An Introduction to Generative Prompt Engineeering

An Introduction to Generative Prompt Engineeering

Introduction

Over the past few years, there has been a significant explosion in the use and development of large language models (LLMs). An LLM is a language model consisting of a neural network with many parameters (commonly multi-billions of weights), trained on large quantities of text. Some of the most popular large language models are: GPT-3 (Generative Pretrained Transformer 3) – developed by OpenAI ; BERT (Bidirectional Encoder Representations from Transformers) – developed by Google; RoBERTa (Robustly Optimized BERT Approach) – developed by Facebook AI; T5 (Text-to-Text Transfer Transformer) – developed by Google. Many others exist and continue to emerge. These language models are designed to understand and generate natural language text, allowing for a wide range of applications such as chatbots, content creation, language translation, and more.

Read More
How To Map a CXL Endpoint to a CPU Socket in Linux

How To Map a CXL Endpoint to a CPU Socket in Linux

When working with CXL Type 3 Memory Expander endpoints, it’s nice to know which CPU Socket owns the root complex for the endpoint. This is very useful for memory tiering solutions where we want to keep the execution of application processes and threads ’local’ to the memory.

CXL memory expanders appear in Linux as memory-only or cpu-less NUMA Nodes. For example, NUMA nodes 2 & 3 do not have any CPUs assigned to them.

Read More
Linux NUMA Distances Explained

Linux NUMA Distances Explained

TL;DR: The memory latency distances between a node and itself is normalized to 10 (1.0x). Every other distance is scaled relative to that 10 base value. For example, the distance between NUMA Node 0 and 1 is 21 (2.1x), meaning if node 0 accesses memory on node 1 or vice versa, the access latency will be 2.1x more than for local memory.

Introduction

Non-Uniform Memory Access (NUMA) is a multiprocessor model in which each processor is connected to dedicated memory but may access memory attached to other processors in the system. To date, we’ve commonly used DRAM for main memory, but next-gen platforms will begin offering High-Bandwidth Memory (HBM) and Compute Express Link (CXL) attached memory. Accessing remote (to the CPU) memory takes much longer than accessing local memory, and not all remote memory has the same access latency. Depending on how the memory architecture is configured, NUMA nodes can be multiple hops away with each hop adding more latency. HBM and CXL devices will appear as memory-only (CPU-less) NUMA nodes.

Read More
Using Linux Kernel Memory Tiering

Using Linux Kernel Memory Tiering

In this post, I’ll discuss what memory tiering is, why we need it, and how to use the memory tiering feature available in the mainline v5.15 Kernel.

What is Memory Tiering?

With the advent of various new memory types, some systems will have multiple types of memory, e.g. High Bandwidth Memory (HBM), DRAM, Persistent Memory (PMem), CXL and others. The Memory Storage hierarchy should be familiar to you.

Memory Storage Hierarchy

Read More
How To map VMWare vSphere/ESXi PMem devices from the Host to Guest VM

How To map VMWare vSphere/ESXi PMem devices from the Host to Guest VM

In this post, we’ll use VMWare ESXi 7.0u3 to create a Guest VM running Ubuntu 21.10 with two Virtual Persistent Memory (vPMem) devices, then show how we can map the vPMem device in the host (ESXi) to “nmem” devices in the guest VM as shown by the ndctl utility.

If you’re new to using vPMem or need a refresher, start with the VMWare Persistent Memory documentation.

Table of Contents

Create a Guest VM with vPMem Devices

The procedure you use may be different from the one shown below if you use vSphere or an automated procedure.

Read More
How To Emulate CXL Devices using KVM and QEMU

How To Emulate CXL Devices using KVM and QEMU

What is CXL?

Compute Express Link (CXL) is an open standard for high-speed central processing unit-to-device and CPU-to-memory connections, designed for high-performance data center computers. CXL is built on the PCI Express physical and electrical interface with protocols in three areas: input/output, memory, and cache coherence.

CXL is designed to be an industry open standard interface for high-speed communications, as accelerators are increasingly used to complement CPUs in support of emerging applications such as Artificial Intelligence and Machine Learning.

Read More
How To Build a custom Linux Kernel to test Data Access Monitor (DAMON)

How To Build a custom Linux Kernel to test Data Access Monitor (DAMON)

DAMON is a data access monitoring framework subsystem for the Linux kernel. DAMON (Data Access MONitor) tool monitors memory access patterns specific to user-space processes introduced in Linux kernel 5.15 LTS, such as operation schemes, physical memory monitoring, and proactive memory reclamation. It was designed and implemented by Amazon AWS Labs and upstreamed into the 5.15 Kernel , but it was not enabled by default.cd /boot

Keen to try this new feature to identify the working set size (Active Memory) of a server or process, this post documents the steps I took to build a custom Kernel with DAMON enabled using Fedora Server 35.

Read More
Resolving commands 'Killed' on GCP f1-micro Compute Engine instances

Resolving commands 'Killed' on GCP f1-micro Compute Engine instances

When I want to perform a quick task, I generally spin up a Google GCP Compute Engine instance as they’re cheap. However, they have limited resources, particularly memory. When refreshing the package repositories, it’s quite easy to encounter an Out-of-Memory (OOM) situation which results in the command - yum or dnf - is ‘killed’. For example:

$ sudo dnf update 
CentOS Stream 8 - AppStream                                                                                                  8.3 MB/s |  18 MB     00:02    
CentOS Stream 8 - BaseOS                                                                                                      13 MB/s |  16 MB     00:01    
CentOS Stream 8 - Extras                                                                                                      69 kB/s |  16 kB     00:00    
Google Compute Engine                                                                                                         20 kB/s | 9.4 kB     00:00    
Google Cloud SDK                                                                                                              24 MB/s |  43 MB     00:01    
Killed

dmesg has a lot of information about the situation, but the key line to confirm dnf caused the OOM event, is:

Read More
How To Enable Debug Logging in ipmctl

How To Enable Debug Logging in ipmctl

The ipmctl utility is used for configuring and managing Intel Optane Persistent Memory modules (DCPMM/PMem). It supports the functionality to:

  • Discover Persistent Memory on the server
  • Provision the persistent memory configuration
  • View and update the firmware on the persistent memory modules
  • Configure data-at-rest security
  • Track health and performance of the persistent memory modules
  • Debug and troubleshoot persistent memory modules

I wrote the IPMCTL User Guide showing how to use the tool, but what if ipmctl returns an error or something you’re not expecting? How do you debug the debugger? On Linux, ipmctl relies on libndctl to help perform communication to the BIOS and persistent memory modules themselves. This is a complicated stack involving multiple kernel drivers and the physical hardware itself. Anything along this path could be causing a problem.

Read More
How To Install Prometheus and Grafana on Fedora Server

How To Install Prometheus and Grafana on Fedora Server

[Updated] This article was updated on 03/13/2021 using Fedora Server 33, Prometheus v2.25.0, Grafana v7.4.3, and Node Exporter v1.1.2.

In this article, we will show how to install Prometheus and Grafana to collect and display system performance metrics.

Prometheus is an open source monitoring and alerting toolkit for bare metal systems, virtual machines, containers, and microservices. Grafana allows you to query, visualize, and alert on metrics using fully customizable dashboards .

Read More
How To Monitor Persistent Memory Performance on Linux using PCM, Prometheus, and Grafana

How To Monitor Persistent Memory Performance on Linux using PCM, Prometheus, and Grafana

In a previous article, I showed How To Install Prometheus and Grafana on Fedora Server . This article demonstrates how to use the open-source Process Counter Monitor (PCM) utility to collect DRAM and Intel® Optane™ Persistent Memory statistics, and visualize the data in Grafana.

Processor Counter Monitor is an application programming interface (API) and a set of tools based on the API to monitor performance and energy metrics of Intel® Core™, Xeon®, Atom™ and Xeon Phi™ processors. It can also show memory bandwidth for DRAM and Intel Optane Persistent Memory devices. PCM works on Linux, Windows, Mac OS X, FreeBSD and DragonFlyBSD operating systems.

Read More
How to build an upstream Fedora Kernel from source

How to build an upstream Fedora Kernel from source

I typically keep my Fedora system current, updating it once every week or two. More recently, I wanted to test the Idle Page Tracking feature, but this wasn’t enabled in the default kernel provided by Fedora.

# grep CONFIG_IDLE_PAGE_TRACKING /boot/config-$(uname -r)
# CONFIG_IDLE_PAGE_TRACKING is not set

To enable the feature, we need to build a custom kernel with the feature(s) we need. Thankfully, the process isn’t too difficult.

For this walk through, I’ll be building a customised version of the Fedora 32 kernel version I already have installed (5.8.7-200.fc32.x86_64), using some of the instructions from https://fedoraproject.org/wiki/Building_a_custom_kernel .

Read More
Linux Device Mapper WriteCache (dm-writecache) performance improvements in Linux Kernel 5.8

Linux Device Mapper WriteCache (dm-writecache) performance improvements in Linux Kernel 5.8

The Linux ‘dm-writecache’ target allows for writeback caching of newly written data to an SSD or NVMe using persistent memory will achieve much better performance in Linux Kernel 5.8.

Red Hat developer Mikulas Patocka has been working to enhance the dm-writecache performance using Intel Optane Persistent Memory (PMem) as the cache device.

The performance optimization now queued for Linux 5.8 is making use of CLFLUSHOPT within dm-writecache when available instead of MOVNTI. CLFLUSHOPT is one of Intel’s persistent memory instructions that allows for optimized flushing of cache lines by supporting greater concurrency. The CLFLUSHOPT instruction has been supported on Intel servers since Skylake and on AMD since Zen.

Read More
"ipmctl show -memoryresources" returns "Error: GetMemoryResourcesInfo Failed"

"ipmctl show -memoryresources" returns "Error: GetMemoryResourcesInfo Failed"

Issue:

Running ipmctl show -memoryresources returns an error similar to the following:

# ipmctl show -memoryresources

Error: GetMemoryResourcesInfo Failed

Applies To:

  • Linux & Microsoft Windows

  • Intel Optane Persistent Memory

  • ipmctl utility

Cause:

The Platform Configuration Data (PCD) is invalid or has been erased using a previously executed ipmctl delete -dimm -pcd command or the system has new persistent memory modules that have not been initialized yet.

A module with an empty PCD will show information similar to the following. This shows an example of PCD of DIMM ID 0x0001. To review the PCD for all modules in the system use ipmctl show -dimm -pcd.

Read More

Intel Optane Persistent Memory Modules report "Non-functional" state in ipmctl

Issue

Executing ipmctl show-dimm to get device information shows the persistent memory modules in a ‘Non-functional’ health state, eg:

# ipmctl show -dimm

 DimmID | Capacity | HealthState    | ActionRequired | LockState | FWVersion
=============================================================================
 0x0001 | 0.0 GiB  | Non-functional | N/A            | N/A       | N/A
 0x0011 | 0.0 GiB  | Non-functional | N/A            | N/A       | N/A
 0x0021 | 0.0 GiB  | Non-functional | N/A            | N/A       | N/A
 0x0101 | 0.0 GiB  | Non-functional | N/A            | N/A       | N/A
 0x0111 | 0.0 GiB  | Non-functional | N/A            | N/A       | N/A
 0x0121 | 0.0 GiB  | Non-functional | N/A            | N/A       | N/A
 0x1001 | 0.0 GiB  | Non-functional | N/A            | N/A       | N/A
 0x1011 | 0.0 GiB  | Non-functional | N/A            | N/A       | N/A
 0x1021 | 0.0 GiB  | Non-functional | N/A            | N/A       | N/A
 0x1101 | 0.0 GiB  | Non-functional | N/A            | N/A       | N/A
 0x1111 | 0.0 GiB  | Non-functional | N/A            | N/A       | N/A
 0x1121 | 0.0 GiB  | Non-functional | N/A            | N/A       | N/A

Other ipmctl commands may fail and return “No functional DIMMs in the system.”, eg:

Read More
How To Set Linux CPU Scaling Governor to Max Performance

How To Set Linux CPU Scaling Governor to Max Performance

The majority of modern processors are capable of operating in a number of different clock frequency and voltage configurations, often referred to as Operating Performance Points or P-states (in ACPI terminology). As a rule, the higher the clock frequency and the higher the voltage, the more instructions can be retired by the CPU over a unit of time, but also the higher the clock frequency and the higher the voltage, the more energy is consumed over a unit of time (or the more power is drawn) by the CPU in the given P-state. Therefore there is a natural trade-off between the CPU capacity (the number of instructions that can be executed over a unit of time) and the power drawn by the CPU.

Read More
How To Verify Linux Kernel Support for Persistent Memory

How To Verify Linux Kernel Support for Persistent Memory

Linux Kernel support for persistent memory was first delivered in version 4.0 of the mainline kernel, however, it was not enabled by default until version 4.2.

If you use a Linux distribution that uses kernel 4.2 or later, or the distro backports features in to an older kernel, you will almost certainly have persistent memory support enabled by default. It is still worth verifying what features are enabled and disabled as this may vary by distro and release version for the very latest persistent memory features.

Read More
A Quick Guide to Signing Your Git Commits

A Quick Guide to Signing Your Git Commits

It is important to sign Git commits for your source code to avoid the code being compromised and to confirm to the repository gatekeeper that you are who you say you are. Signing guarantees that my code is my work, it is my copyright and nobody else can fake it. This guide provides the necessary steps to creating private & public keys so you can sign your Git commits.

On Linux or Mac, if you have setup a development environment then you have all the necessary tools for signing.

Read More
Programming Persistent Memory: A Comprehensive Guide for Developers Book

Programming Persistent Memory: A Comprehensive Guide for Developers Book

After many months of hard work by everyone involved, I’m very pleased to announce that the book “Programming Persistent Memory: A Comprehensive Guide for Developers” is now available for download in digital PDF & ePUB formats from https://pmem.io/book , and Kindle & paperback through Amazon .

Beginner and experienced programmers will use this comprehensive guide to persistent memory programming. You will understand how persistent memory brings together several new software/hardware requirements, and offers great promise for better performance and faster application startup times―a huge leap forward in byte-addressable capacity compared with current DRAM offerings.
This revolutionary new technology gives applications significant performance and capacity improvements over existing technologies. It requires a new way of thinking and developing, which makes this highly disruptive to the IT/computing industry. The full spectrum of industry sectors that will benefit from this technology include, but are not limited to, in-memory and traditional databases, AI, analytics, HPC, virtualization, and big data.   
Programming Persistent Memory describes the technology and why it is exciting the industry. It covers the operating system and hardware requirements as well as how to create development environments using emulated or real persistent memory hardware. The book explains fundamental concepts; provides an introduction to persistent memory programming APIs for C, C++, JavaScript, and other languages; discusses RMDA with persistent memory; reviews security features; and presents many examples. Source code and examples that you can run on your own systems are included.
What You’ll Learn
- Understand what persistent memory is, what it does, and the value it brings to the industry
- Become familiar with the operating system and hardware requirements to use persistent memory
- Know the fundamentals of persistent memory programming: why it is different from current programming methods, and what developers need to keep in mind when programming for persistence
- Look at persistent memory application development by example using the Persistent Memory Development Kit (PMDK)
- Design and optimize data structures for persistent memory
- Study how real-world applications are modified to leverage persistent memory
- Utilize the tools available for persistent memory programming, application performance profiling, and debugging
Who This Book Is For
C, C++, Java, and Python developers, but will also be useful to software, cloud, and hardware architects across a broad spectrum of sectors, including cloud service providers, independent software vendors, high performance compute, artificial intelligence, data analytics, big data, etc. 

Read More
How To Extend Volatile System Memory (RAM) using Persistent Memory on Linux

How To Extend Volatile System Memory (RAM) using Persistent Memory on Linux

Intel(R) Optane(TM) Persistent Memory delivers a unique combination of affordable large capacity and support for data persistence. Electrically compatible with DDR4, large capacity modules up to 512GB each can be installed in compatible servers alongside DDR on the memory bus.

Intel’s persistent memory product can be provisioned in a volatile “Memory Mode” which replaces DRAM volatile capacity with the persistent memory capacity, and persistent “AppDirect” mode which presents both DRAM and persistent memory to the operating system and applications. Both modes are explained in more detail here . It is possible to configure a system to utilize a percentage of persistent memory as volatile and persistent, but this mixed-mode still provisions all the DRAM capacity as a Last-Level Cache.

Read More
Using Linux Volume Manager (LVM) with Persistent Memory

Using Linux Volume Manager (LVM) with Persistent Memory

In this article, we show how to use the Linux Volume Manager (LVM) to create concatenated, striped, and mirrored logical volumes using persistent memory modules as the backing storage device. Specifically, we will be using the Intel® Optane™ Persistent Memory Modules on a two socket system with Intel® Cascade Lake Xeon® CPUs, also referred to as 2nd Generation Intel® Xeon® Scalable Processors.

Contents

How to Create a Bootable Windows USB in Fedora Linux

How to Create a Bootable Windows USB in Fedora Linux

In this tutorial, I am going to show you how to create a Windows Server 2019 bootable USB in Linux, though any Windows version will work. I am using Fedora 30 for this tutorial but the steps should be valid for other Linux distributions as well.

Here’s what you need:

  • Windows Server 2019 ISO (or Windows 10 ISO)

  • WoeUSB Application

  • A USB key (pen drive or stick) with at least 6 Gb of space

    Read More
Percona Live 2019

Percona Live 2019

I will be speaking at this year’s Percona Live event. Percona Live 2019 takes place in Austin, Texas from May 28-30, 2019 at the Hyatt Regency . You can register for the event and check out the schedule of talks. See you there.

About Percona Live

Percona Live conferences provide the open source database community with an opportunity to discover and discuss the latest open source trends, technologies and innovations. The conference includes the best and brightest innovators and influencers in the open source database industry.

Read More