Using Linux Kernel Tiering with Compute Express Link (CXL) Memory

Using Linux Kernel Tiering with Compute Express Link (CXL) Memory

In this blog post, we will walk through the process of enabling the Linux Kernel Transparent Page Placement (TPP) feature with CXL memory mapped as NUMA nodes using the system-ram namespace. This feature allows the kernel to automatically place pages in different types of memory based on their usage patterns.

Prerequisites

This guide assumes that you are using a Fedora 36 system with Kernel 5.19.13, and that your system has a Samsung CXL device installed. You can confirm the presence of the CXL device with the following command:

lspci | grep CXL

Step 1: Verify Automatic Memory Onlining

First, we need to verify if the OS automatically onlines memory. This can be done with the following command:

grep CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE /boot/config-$(uname -r)

If the output is CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y, then the OS is configured to automatically online memory.

Step 2: Change the Default Memory Zone

Next, we change the default memory zone when memory is onlined to ZONE_MOVABLE. This can be done with the following command:

sudo echo online_movable > /sys/devices/system/memory/auto_online_blocks

Step 3: Convert the Namespace

We then use daxctl to convert the namespace from devdax to system-ram for all CXL Devices. This can be done with the following command:

daxctl reconfigure-device --mode=system-ram --force all

Step 4: Verify NUMA Output

At this point, you should be able to see the single-CPU (NODE0) and Samsung CXL device (NODE1) in the NUMA output. You can check this with the following command:

numactl -H

Step 5: Display Memory Blocks by NUMA Node and Zone

You can display the memory blocks by NUMA node and Zone with the following command:

lsmem -o +NODE,ZONES

Step 6: Enable Kernel Transparent Page Placement (TPP)

Finally, we can enable Kernel Transparent Page Placement (TPP). First, check the default setting for page demotions:

cat /sys/kernel/mm/numa/demotion_enabled

If the output is false, enable it with the following command:

echo true > /sys/kernel/mm/numa/demotion_enabled

Then, enable promotions:

echo 2 > /proc/sys/kernel/numa_balancing

Lastly, do reclaim for each zone. This makes sure that demotion is run to maintain a minimum set of free pages in each NUMA node:

echo 1 > /proc/sys/vm/zone_reclaim_mode

And that’s it! You have now enabled the Linux Kernel Transparent Page Placement (TPP) feature with CXL memory mapped as NUMA nodes using the system-ram namespace.

Please note that this guide is based on a specific system configuration and may need to be adjusted based on your specific hardware and software setup. Always refer to the official documentation for the most accurate and up-to-date information.

How To Install and Boot VMWare VSphere/ESXi from Persistent Memory (or not)

How To Install and Boot VMWare VSphere/ESXi from Persistent Memory (or not)

In a previous post I described how to install and boot Linux using only Persistent Memory, no SSDs are required. For this follow on post, I attempted to install VMWare VSphere/ESXi v7.0u2 onto the persistent memory.

TL;DR - It doesn’t work. The installer doesn’t list the PMem devices, and I was unable to find a way to manually select the PMem device(s).

I assume you followed the previous post to configure sector namespaces that we’ll use to install ESXi.

Read More
How to Confirm Virtual to Physical Memory Mappings for PMem and FSDAX Files

How to Confirm Virtual to Physical Memory Mappings for PMem and FSDAX Files

Are you curious whether your application’s memory-mapped files are really using Intel Optane Persistent Memory (PMem), Compute Express Link (CXL) Non-Volatile Memory Modules (NV-CMM), or another DAX-enabled persistent memory device? Want to understand how virtual memory maps onto physical, non-volatile regions? Let’s use easily adaptable scripts in both Python and C to confirm this on your Linux system, definitively.

Why Does This Matter?

With the advent of persistent memory and DAX (Direct Access) filesystems, applications can memory-map files directly onto PMem, bypassing the traditional DRAM page cache. This promises significant performance and durability improvements for data-intensive workloads and databases, such as SQLite, Redis, and others.

Read More
Using Linux Volume Manager (LVM) with Persistent Memory

Using Linux Volume Manager (LVM) with Persistent Memory

In this article, we show how to use the Linux Volume Manager (LVM) to create concatenated, striped, and mirrored logical volumes using persistent memory modules as the backing storage device. Specifically, we will be using the Intel® Optane™ Persistent Memory Modules on a two socket system with Intel® Cascade Lake Xeon® CPUs, also referred to as 2nd Generation Intel® Xeon® Scalable Processors.

Contents