Blog Posts

Remote Development Using VS Code and SSH with AWS EC2

Remote Development Using VS Code and SSH with AWS EC2

How to Perform Remote Code Development Using VS Code on a Remote AWS EC2 Instance via SSH

Remote development has become a crucial tool for developers, enabling the convenience of coding and deploying directly to remote environments. In this blog, I’ll walk you through the process of setting up Visual Studio Code (VS Code) to develop remotely on an AWS EC2 instance using SSH. By the end of this guide, you’ll be able to connect to your EC2 instance from VS Code and develop code as if you were working locally.

Read More
How Much RAM Could a Vector Database Use If a Vector Database Could Use RAM

How Much RAM Could a Vector Database Use If a Vector Database Could Use RAM

Featured image generated by ChatGPT 4o model: “a low poly woodchuck by a serene lake, surrounded by mountains and a forest with tree leaves made from DDR memory modules. The woodchuck is munching on a memory DIMM. The only memory DIMM in the image should be the one being eaten.”

How Much RAM Could a Vector Database Use If a Vector Database Could Use RAM?

Although the title is a punn from the famous “woodchuck rhyme,” the question is serious for LLM applications using vector databases. As large language models (LLMs) continue to evolve, leveraging vector databases to store and search embeddings is critical. Understanding the memory usage of these systems is essential for maintaining performance, response times, and ensuring system scalability.

Read More
Understanding Memory Usage with `smem`

Understanding Memory Usage with `smem`

Memory management is crucial for Linux administrators and developers, especially when optimizing performance for resource-intensive applications. While tools like top and htop are commonly used to monitor system performance, they often don’t provide enough detail regarding memory usage breakdown. This is where smem comes into play.

What is smem?

smem is a command-line tool that reports memory usage per process and provides better insight into shared memory than most traditional tools, taking shared memory pages into account. Unlike top or htop, which primarily display RSS (Resident Set Size), smem can also show USS (Unique Set Size), which is a better metric for understanding how much memory would be freed if a particular process were terminated. This blog will guide you through using smem, explaining these critical memory metrics and providing comparisons to more familiar tools.

Read More
Benchmarking GPUs: Measuring Throughput between CPU and GPU

Benchmarking GPUs: Measuring Throughput between CPU and GPU

This article was inspired by a LinkedIn post by Dennis Kennetz . The CPU to GPU bandwidth check is available on GitHub which uses a specific flow to assess the data transfer rates. Like many in the industry, my focus is on AI and ML workloads and how we can improve efficiencies and performance using DRAM, CXL, CPU, GPUs, and software improvements.

In the rapidly evolving landscape of artificial intelligence (AI) and machine learning (ML), the ability to process vast amounts of data efficiently is paramount. As AI models grow in complexity and size, the demand for high-performance computing resources intensifies. At the heart of this demand lies the crucial task of optimizing data transfers between various components of a computing system, particularly from DRAM, CPU, and emerging technologies like CXL (Compute Express Link) to and from the GPU.

Read More
50 Generative AI Prompts I use as a Product Manager to Improve Efficiency and Product Quality

50 Generative AI Prompts I use as a Product Manager to Improve Efficiency and Product Quality

As a product manager, the success of our products depends on our ability to make informed, strategic decisions quickly and efficiently. In the fast-paced world of the IT industry, it’s crucial to stay ahead of trends, understand our customers deeply, and align our product development with our business goals. This is where generative AI and a well-crafted set of prompts have become invaluable tools in my daily workflow.

Why I Use These Prompts Daily

Every day, I’m faced with decisions that can significantly impact the trajectory of our product roadmap, customer satisfaction, and ultimately, our bottom line. The prompts I use are not just about generating ideas but about driving actionable insights that are tailored to our specific needs. These prompts cover a wide range of product management aspects—from prioritizing features based on customer feedback to crafting a go-to-market strategy that aligns with our business objectives.

Read More
Linux Kernel 6.10 is Released: This is What's New for Compute Express Link (CXL)

Linux Kernel 6.10 is Released: This is What's New for Compute Express Link (CXL)

The Linux Kernel 6.10 release brings several improvements and additions related to Compute Express Link (CXL) technology.

Here is the detailed list of all commits merged into the 6.10 Kernel for CXL and DAX. This list was generated by the Linux Kernel CXL Feature Tracker .

Read More
Linux Kernel 6.9 is Released: This is What's New for Compute Express Link (CXL)

Linux Kernel 6.9 is Released: This is What's New for Compute Express Link (CXL)

The Linux Kernel 6.9 release brings several improvements and additions related to Compute Express Link (CXL) technology.

New Features

Here is a list of new features for CXL:

Here is the detailed list of all commits merged into the 6.9 Kernel for CXL and DAX. This list was generated by the Linux Kernel CXL Feature Tracker .

Read More
Running Open WebUI and Ollama on Ubuntu 22.04 for a Local ChatGPT Experience

Running Open WebUI and Ollama on Ubuntu 22.04 for a Local ChatGPT Experience

Introduction

Open WebUI and Ollama are powerful tools that allow you to create a local chat experience using GPT models. Whether you’re experimenting with natural language understanding or building your own conversational AI, these tools provide a user-friendly interface for interacting with language models. In this guide, we’ll walk you through the installation process step by step.

Ollama is a cutting-edge platform designed to run open-source large language models locally on your machine. It simplifies the complexities involved in deploying and managing these models, making it an attractive choice for researchers, developers, and anyone who wants to experiment with language models1. Ollama provides a user-friendly interface for running large language models (LLMs) locally, specifically on MacOS and Linux (with Windows support on the horizon).

Read More
Tags