1

I am running k3s on a cluster of 3x Raspberry Pi 4. I keep running into problems when the nodes exhibit DiskPressure and pods are Evicted - however, I'm at a loss as to what is taking up the space on the 15G SD cards I'm using. I've tried all the obvious candidates - /var/log files, journalctl --vacuum-size, docker system prune -af --volumes - but I'm never able to get usage of the root filesystem much below 80%:

$ df -h | head
Filesystem                  Size  Used Avail Use% Mounted on
/dev/root                    15G   11G  3.0G  79% /
devtmpfs                    3.7G     0  3.7G   0% /dev
tmpfs                       3.9G     0  3.9G   0% /dev/shm
tmpfs                       1.6G  7.2M  1.6G   1% /run
tmpfs                       5.0M  4.0K  5.0M   1% /run/lock
/dev/mmcblk0p1              253M   32M  221M  13% /boot
...(other mounted filesystems, like external hard drives and NFS mounts)

I've been using du --max-depth 1 -xh . 2>/dev/null to try to track down large objects, but that's hit a dead end - especially since df and du are not intended to give matching results:

$ du --max-depth 1 / -xh 2>/dev/null
8.0K    /mnt
2.1G    /usr
4.0K    /media
4.0K    /opt
16K     /lost+found
6.0M    /etc
146M    /home
4.0K    /root
1.3G    /var
4.0K    /srv
40K     /tmp
3.5G    /

When du tells me that only 3.5G is being used, but df reports that 11G is used, what alternative tools can I use to find junk to delete (or - junk which is evidence of malfunctioning programs)?

Google is not particularly helpful here - most answers centre around du or ls (which gives a similar view to du), or using find to find large files (moderately helpful, but not useful if I have a proliferation of small files), and even [ncdu](https://unix.stackexchange.com/a/125451/30828) agrees with du that only ~3.5G is in use. As per this guide, I tried to find any files that have been deleted (and so, are "seen" by df but not by du), but came up (nearly) empty:

$ sudo lsof -w | grep -i 'deleted'
systemd-j    155                              root   27u      REG              179,2    33554432      37340 /var/log/journal/539cc463fa774d11a5642e3744db7544/user-1000@f197a92838804bf28f92299ece25a807-000000000005daa8-0005f1ce57c1e95c.journal (deleted)
scubbo
  • 113

1 Answers1

3

You are running du --max-depth 1 / -xh 2>/dev/null as an ordinary user. As such there will be plenty of directories which it cannot traverse due to permission restrictions. You must run this command as root.

Bib
  • 2,380
  • D'oh - that seems so obvious in hindsight, thank you! Unfortunately sudo du ... is hanging, and sudo strace -u root sudo -k du ... hangs on a line that reads ppoll([{fd=-1}, {fd=3, events=POLLIN}], 2, NULL, NULL, 8, which I'm trying to interpret - a negative file-descriptor seems impossible/error-flavoured! Regardless, this is a perfect answer to the original question, and I'll accept it as such. Thanks! – scubbo Jan 11 '23 at 02:58
  • @scubbo There are a few dirs which you should not run du against. The output should tell you what has been completed and what the next is causing the hang. I tend to run du -s dir1 dir2 dir3 etc. You really do not want to run it against /proc for instance. Divide and conquer... – Bib Jan 11 '23 at 10:50
  • @Bib what's the problem running it through /proc? Never seen any. The worst that could happen is that it outputs a few lines into the stdout about some files disappeared during the run. Also, there is no predetermined order, so you can't know what's going next even if you see what has been completed. – Nikita Kipriyanov Jan 13 '23 at 05:46