Running Open WebUI and Ollama on Ubuntu 22.04 for a Local ChatGPT Experience

Running Open WebUI and Ollama on Ubuntu 22.04 for a Local ChatGPT Experience

Introduction

Open WebUI and Ollama are powerful tools that allow you to create a local chat experience using GPT models. Whether you’re experimenting with natural language understanding or building your own conversational AI, these tools provide a user-friendly interface for interacting with language models. In this guide, we’ll walk you through the installation process step by step.

Ollama is a cutting-edge platform designed to run open-source large language models locally on your machine. It simplifies the complexities involved in deploying and managing these models, making it an attractive choice for researchers, developers, and anyone who wants to experiment with language models1. Ollama provides a user-friendly interface for running large language models (LLMs) locally, specifically on MacOS and Linux (with Windows support on the horizon).

Open WebUI is an extensible, feature-rich, and user-friendly self-hosted WebUI designed to operate entirely offline. It supports various large language model (LLM) runners, including Ollama and OpenAI-compatible APIs.

Prerequisites

Before we dive into the installation, make sure you have the following prerequisites:

  1. System Requirements:
    • Your system can be CPU-only, or you may have an NVIDIA or AMD GPU.
    • We’ll be using Ubuntu 22.04 as the operating system.
    • At least 4 vCPUs, 8 or more is recommended.
    • You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
  2. Software:

Installation Steps

1. Install or Upgrade Docker Engine on Ubuntu

Follow Install Docker Engine on Ubuntu to install or upgrade Docker on your Ubuntu system.

2. Install GPU support for Docker

2.1 NVidia Container Runtime for Docker

The NVIDIA Container Runtime for Docker is an improved mechanism for allowing the Docker Engine to support NVIDIA GPUs used by GPU-accelerated containers. This new runtime replaces the Docker Engine Utility for NVIDIA GPUs.

Follow the official NVidia documentation to Install or Upgrade the NVidia Container Runtime for Docker .

3. Install OpenWebUI and Ollama

There are several installation methods available depending on your environment. Use one of the options described below:

[Option 1] Installing Open WebUI with Bundled Ollama Support

This is the easiest and recommended method. This installation method uses a single container image that bundles Open WebUI with Ollama, allowing for a streamlined setup via a single command. Choose the appropriate command based on your hardware environment:

  • With NVidia GPU and CUDA Support: Utilize GPU resources by running the following command:
sudo docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda
  • With GPU Support: Utilize GPU resources by running the following command:
sudo docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama
  • For CPU Only: If you’re not using a GPU, use this command instead:
sudo docker run -d -p 3000:8080 -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

Verify the Docker instance is running:

$ sudo docker ps
CONTAINER ID   IMAGE                                  COMMAND           CREATED             STATUS             PORTS                                       NAMES
984c0d7006ef   ghcr.io/open-webui/open-webui:ollama   "bash start.sh"   About an hour ago   Up About an hour   0.0.0.0:3000->8080/tcp, :::3000->8080/tcp   open-webui

[Option 2] Installation with the Default Open WebUI Configuration using a separate Ollama instance

Install Ollama using:

curl -fsSL https://ollama.com/install.sh | sh

With Ollama now installed, use this command to start Open WebUI:

sudo docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
  • To connect to Ollama on a remote server, change the OLLAMA_BASE_URL to the remote server’s URL:
sudo docker run -d -p 3000:8080 -e OLLAMA_BASE_URL=https://example.com -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
  • To run Open WebUI with Nvidia GPU support, use this command:
sudo docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda

Verify the Docker instance is running:

$ sudo docker ps
CONTAINER ID   IMAGE                                  COMMAND           CREATED             STATUS             PORTS                                       NAMES
984c0d7006ef   ghcr.io/open-webui/open-webui:ollama   "bash start.sh"   About an hour ago   Up About an hour   0.0.0.0:3000->8080/tcp, :::3000->8080/tcp   open-webui

[Option 3] Install using Docker Compose

1. Start by cloning the OpenWebUI GitHub repository:

git clone https://github.com/open-webui/open-webui
cd open-webui

2. Choosing the Appropriate Docker Compose File

OpenWebUI provides several Docker Compose files for different configurations. Depending on your hardware, choose the relevant file:

  • docker-compose.amdgpu.yaml: For AMD GPUs
  • docker-compose.api.yaml: For API-only setup
  • docker-compose.data.yaml: For data services
  • docker-compose.gpu.yaml: For NVIDIA GPUs
  • docker-compose.yaml: Default configuration

3. Starting the Docker Environment

Execute the following command to start the Docker Compose setup:

sudo docker compose -v open-webui:/app/backend/data -f docker-compose.yaml up

Verify the Docker instance is running:

$ sudo docker ps
CONTAINER ID   IMAGE                                  COMMAND           CREATED             STATUS             PORTS                                       NAMES
984c0d7006ef   ghcr.io/open-webui/open-webui:ollama   "bash start.sh"   About an hour ago   Up About an hour   0.0.0.0:3000->8080/tcp, :::3000->8080/tcp   open-webui

Remember to access documents via /data/docs to prevent data loss when the container stops.

4. Access the User Interface

Open your web browser and navigate to:

http://<your-host-IP>:3000

You should see the Login page:

5. Create a New User

Click “Sign up” to create a new local user account. Enter your Name, EMail, and Password, then click “Create Acccount”.

You will be automatically logged in and taken to the home page.

OpenWebUI Landing Page

6. Downloading Ollama Models

There is a growing list of models to choose from. Explore the models available on Ollama’s library .

Import one or more model into Ollama using Open WebUI:

  1. Click the “+” next to the models drop-down in the UI.
  2. Alternatively, go to Settings -> Models -> “Pull a model from Ollama.com.”
OpenWebUI Import Ollama Models

Conclusion

Congratulations! You’ve successfully set up Open WebUI and Ollama for your local ChatGPT experience. Feel free to explore the capabilities of these tools and customize your chat environment. Happy chatting!

Understanding Memory Usage with `smem`

Understanding Memory Usage with `smem`

Memory management is crucial for Linux administrators and developers, especially when optimizing performance for resource-intensive applications. While tools like top and htop are commonly used to monitor system performance, they often don’t provide enough detail regarding memory usage breakdown. This is where smem comes into play.

What is smem?

smem is a command-line tool that reports memory usage per process and provides better insight into shared memory than most traditional tools, taking shared memory pages into account. Unlike top or htop, which primarily display RSS (Resident Set Size), smem can also show USS (Unique Set Size), which is a better metric for understanding how much memory would be freed if a particular process were terminated. This blog will guide you through using smem, explaining these critical memory metrics and providing comparisons to more familiar tools.

Read More
How to Boot Linux from Intel® Optane™  Persistent Memory

How to Boot Linux from Intel® Optane™ Persistent Memory

Introduction

In this article, I will demonstrate how to configure a system with Intel Optane Persistent Memory (PMem) and use part of the PMem as a boot device. This little known feature can reduce boot times for those that need it.

The basic steps include:

  • Configure the Persistent Memory in AppDirect Interleaved
  • Create two small SECTOR namespaces, one per Region
  • Install the OS and select one or both of the namespaces (single disk install, or mirrored LVM)

Configure the Persistent Memory

The following figure shows how we will provision the persistent memory.

Read More
How to Confirm Virtual to Physical Memory Mappings for PMem and FSDAX Files

How to Confirm Virtual to Physical Memory Mappings for PMem and FSDAX Files

Are you curious whether your application’s memory-mapped files are really using Intel Optane Persistent Memory (PMem), Compute Express Link (CXL) Non-Volatile Memory Modules (NV-CMM), or another DAX-enabled persistent memory device? Want to understand how virtual memory maps onto physical, non-volatile regions? Let’s use easily adaptable scripts in both Python and C to confirm this on your Linux system, definitively.

Why Does This Matter?

With the advent of persistent memory and DAX (Direct Access) filesystems, applications can memory-map files directly onto PMem, bypassing the traditional DRAM page cache. This promises significant performance and durability improvements for data-intensive workloads and databases, such as SQLite, Redis, and others.

Read More