Ollama

Ollama is an application which lets you run offline large language models locally.

Installation

Install ollama to run models on CPU
To run models on GPU:
- Install ollama-cuda for NVIDIA
- Install ollama-rocm for AMD.

Next, enable/start ollama.service. Then, verify Ollama's status:

$ ollama --version

If it says Warning: could not connect to a running Ollama instance, then the Ollama service has not been run; otherwise, the Ollama service is running and is ready to accept user requests.

Next, verify that you can run models. The following command downloads the latest Distilled DeepSeek-R1 model and returns an Ollama prompt that allows you to talk to the model:

$ ollama run deepseek-r1:1.5b

>>> Send a message (/? for help)

Usage

The Ollama executable does not provide a search interface. There is no such command as ollama search. To search for a model, you need to visit their search page.

To run a model:

$ ollama run model

To stop a model:

$ ollama stop model

To update a model:

$ ollama pull model

To remove a model:

$ ollama rm model

Troubleshooting

ROCm is not utilizing my AMD integrated GPU

You may have used utilities like amdgpu_top to monitor the utilization of your integrated GPU during an Ollama session, but only to notice that your integrated GPU has not been used at all.

That is expected: without configuration, ROCm simply ignores your integrated GPU, causing everything to be computed on CPU.

The required configuration is, however, very simple because all you need is to create a drop-in file for ollama.service:

/etc/systemd/system/ollama.service.d/override_gfx_version.conf

[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=X.Y.Z"

Where X.Y.Z is dependent to the GFX version that is shipped with your system.

To determine which GFX version to use, first make sure rocminfo has already been installed. It should be pulled in to your system as a dependency of rocblas, which is itself a dependency of ollama-rocm.

Next, query the actual GFX version of your system:

$ /opt/rocm/bin/rocminfo | grep amdhsa

You need to remember the digits printed after the word gfx, because this is the actual GFX version of your system. The digits are interpreted as follows:

If the digits are 4-digit, they are interpreted as XX.Y.Z, where the first two digits are interpreted as the X part.
If the digits are 3-digit, they are interpreted as X.Y.Z.

Then, find all installed rocblas kernels:

$ find /opt/rocm/lib/rocblas/library -name 'Kernels.so-*'

You need to set X.Y.Z to one of the available versions listed there. The rules are summarized as follows:

For the X part, it must be strictly equal to the actual version.
For the Y part, mismatch is allowed, but it must be no greater than the the actual version.
For the Z part, mismatch is allowed, but it must be no greater than the the actual version.

After setting the correct X.Y.Z, perform a daemon-reload and restart ollama.service.

Then, run your model as usual. You may wish to monitor GPU utilization with amdgpu_top again.

Ollama

Installation

Usage

Troubleshooting

ROCm is not utilizing my AMD integrated GPU

See also