Using Linux Kernel Tiering with Compute Express Link (CXL) Memory

Using Linux Kernel Tiering with Compute Express Link (CXL) Memory

In this blog post, we will walk through the process of enabling the Linux Kernel Transparent Page Placement (TPP) feature with CXL memory mapped as NUMA nodes using the system-ram namespace. This feature allows the kernel to automatically place pages in different types of memory based on their usage patterns.

Prerequisites

This guide assumes that you are using a Fedora 36 system with Kernel 5.19.13, and that your system has a Samsung CXL device installed. You can confirm the presence of the CXL device with the following command:

lspci | grep CXL

Step 1: Verify Automatic Memory Onlining

First, we need to verify if the OS automatically onlines memory. This can be done with the following command:

grep CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE /boot/config-$(uname -r)

If the output is CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y, then the OS is configured to automatically online memory.

Step 2: Change the Default Memory Zone

Next, we change the default memory zone when memory is onlined to ZONE_MOVABLE. This can be done with the following command:

sudo echo online_movable > /sys/devices/system/memory/auto_online_blocks

Step 3: Convert the Namespace

We then use daxctl to convert the namespace from devdax to system-ram for all CXL Devices. This can be done with the following command:

daxctl reconfigure-device --mode=system-ram --force all

Step 4: Verify NUMA Output

At this point, you should be able to see the single-CPU (NODE0) and Samsung CXL device (NODE1) in the NUMA output. You can check this with the following command:

numactl -H

Step 5: Display Memory Blocks by NUMA Node and Zone

You can display the memory blocks by NUMA node and Zone with the following command:

lsmem -o +NODE,ZONES

Step 6: Enable Kernel Transparent Page Placement (TPP)

Finally, we can enable Kernel Transparent Page Placement (TPP). First, check the default setting for page demotions:

cat /sys/kernel/mm/numa/demotion_enabled

If the output is false, enable it with the following command:

echo true > /sys/kernel/mm/numa/demotion_enabled

Then, enable promotions:

echo 2 > /proc/sys/kernel/numa_balancing

Lastly, do reclaim for each zone. This makes sure that demotion is run to maintain a minimum set of free pages in each NUMA node:

echo 1 > /proc/sys/vm/zone_reclaim_mode

And that’s it! You have now enabled the Linux Kernel Transparent Page Placement (TPP) feature with CXL memory mapped as NUMA nodes using the system-ram namespace.

Please note that this guide is based on a specific system configuration and may need to be adjusted based on your specific hardware and software setup. Always refer to the official documentation for the most accurate and up-to-date information.

Linux NUMA Distances Explained

Linux NUMA Distances Explained

TL;DR: The memory latency distances between a node and itself is normalized to 10 (1.0x). Every other distance is scaled relative to that 10 base value. For example, the distance between NUMA Node 0 and 1 is 21 (2.1x), meaning if node 0 accesses memory on node 1 or vice versa, the access latency will be 2.1x more than for local memory.

Introduction

Non-Uniform Memory Access (NUMA) is a multiprocessor model in which each processor is connected to dedicated memory but may access memory attached to other processors in the system. To date, we’ve commonly used DRAM for main memory, but next-gen platforms will begin offering High-Bandwidth Memory (HBM) and Compute Express Link (CXL) attached memory. Accessing remote (to the CPU) memory takes much longer than accessing local memory, and not all remote memory has the same access latency. Depending on how the memory architecture is configured, NUMA nodes can be multiple hops away with each hop adding more latency. HBM and CXL devices will appear as memory-only (CPU-less) NUMA nodes.

Read More
How to Create a Bootable Windows USB in Fedora Linux

How to Create a Bootable Windows USB in Fedora Linux

In this tutorial, I am going to show you how to create a Windows Server 2019 bootable USB in Linux, though any Windows version will work. I am using Fedora 30 for this tutorial but the steps should be valid for other Linux distributions as well.

Here’s what you need:

  • Windows Server 2019 ISO (or Windows 10 ISO)

  • WoeUSB Application

  • A USB key (pen drive or stick) with at least 6 Gb of space

    Read More
How To Monitor Persistent Memory Performance on Linux using PCM, Prometheus, and Grafana

How To Monitor Persistent Memory Performance on Linux using PCM, Prometheus, and Grafana

In a previous article, I showed How To Install Prometheus and Grafana on Fedora Server . This article demonstrates how to use the open-source Process Counter Monitor (PCM) utility to collect DRAM and Intel® Optane™ Persistent Memory statistics, and visualize the data in Grafana.

Processor Counter Monitor is an application programming interface (API) and a set of tools based on the API to monitor performance and energy metrics of Intel® Core™, Xeon®, Atom™ and Xeon Phi™ processors. It can also show memory bandwidth for DRAM and Intel Optane Persistent Memory devices. PCM works on Linux, Windows, Mac OS X, FreeBSD and DragonFlyBSD operating systems.

Read More