Ollama
Ollama is an application which lets you run offline large language models locally.
Installation
- Install ollama to run models on CPU
- To run models on GPU:
- Install ollama-cuda for NVIDIA
- Install ollama-rocm for AMD.
Next, enable/start ollama.service
. Then, verify Ollama's status:
$ ollama --version
If it says Warning: could not connect to a running Ollama instance
, then the Ollama service has not been run; otherwise, the Ollama service is running and is ready to accept user requests.
Next, verify that you can run models. The following command downloads the latest Distilled DeepSeek-R1 model and returns an Ollama prompt that allows you to talk to the model:
$ ollama run deepseek-r1:1.5b
>>> Send a message (/? for help)
Usage
The Ollama executable does not provide a search interface. There is no such command as ollama search
. To search for a model, you need to visit their search page.
To run a model:
$ ollama run model
To stop a model:
$ ollama stop model
To update a model:
$ ollama pull model
To remove a model:
$ ollama rm model
Troubleshooting
ROCm is not utilizing my AMD integrated GPU
You may have used utilities like amdgpu_top to monitor the utilization of your integrated GPU during an Ollama session, but only to notice that your integrated GPU has not been used at all.
That is expected: without configuration, ROCm simply ignores your integrated GPU, causing everything to be computed on CPU.
The required configuration is, however, very simple because all you need is to create a drop-in file for ollama.service
:
/etc/systemd/system/ollama.service.d/override_gfx_version.conf
[Service] Environment="HSA_OVERRIDE_GFX_VERSION=X.Y.Z"
Where X.Y.Z
is dependent to the GFX version that is shipped with your system.
To determine which GFX version to use, first make sure rocminfo has already been installed. It should be pulled in to your system as a dependency of rocblas, which is itself a dependency of ollama-rocm.
Next, query the actual GFX version of your system:
$ /opt/rocm/bin/rocminfo | grep amdhsa
You need to remember the digits printed after the word gfx
, because this is the actual GFX version of your system. The digits are interpreted as follows:
- If the digits are 4-digit, they are interpreted as
XX.Y.Z
, where the first two digits are interpreted as theX
part. - If the digits are 3-digit, they are interpreted as
X.Y.Z
.
Then, find all installed rocblas kernels:
$ find /opt/rocm/lib/rocblas/library -name 'Kernels.so-*'
You need to set X.Y.Z
to one of the available versions listed there. The rules are summarized as follows:
- For the
X
part, it must be strictly equal to the actual version. - For the
Y
part, mismatch is allowed, but it must be no greater than the the actual version. - For the
Z
part, mismatch is allowed, but it must be no greater than the the actual version.
After setting the correct X.Y.Z
, perform a daemon-reload and restart ollama.service
.
Then, run your model as usual. You may wish to monitor GPU utilization with amdgpu_top again.