11

OS: RHEL8 Filesystem: xfs

I'm guessing that some zombie process is holding the space, but I can't find it.

The problem is with /var/lib/pgsql/14, which df says is using 1.1TB of space:

$ sudo df -T -xtmpfs -xdevtmpfs -h --sync  
Filesystem                                 Type  Size  Used Avail Use% Mounted on
/dev/sda2                                  xfs   126G   14G  113G  11% /
/dev/sda1                                  xfs   2.0G  658M  1.4G  33% /boot
/dev/mapper/pgsql14vg-pgsql141v            xfs   5.4T  1.1T  4.4T  20% /var/lib/pgsql/14
/dev/mapper/pglogvg-pglog1v                xfs    15G  292M   15G   2% /var/log/postgresql
FISPFILNAS01.xxxxxxxxxxxx:/DB_backups_TAPb nfs4   15T  2.3T   13T  16% /var/lib/pgsql/14/backups

However, du says it only has 12GB of files:

$ du -x -d2 -h /var/lib/pgsql/14 | sort -k2
12G     /var/lib/pgsql/14
12G     /var/lib/pgsql/14/data
42M     /var/lib/pgsql/14/data/base
2.8M    /var/lib/pgsql/14/data/global
0       /var/lib/pgsql/14/data/pg_commit_ts
0       /var/lib/pgsql/14/data/pg_dynshmem
4.0K    /var/lib/pgsql/14/data/pg_logical
16K     /var/lib/pgsql/14/data/pg_multixact
0       /var/lib/pgsql/14/data/pg_notify
0       /var/lib/pgsql/14/data/pg_replslot
0       /var/lib/pgsql/14/data/pg_serial
0       /var/lib/pgsql/14/data/pg_snapshots
0       /var/lib/pgsql/14/data/pg_stat
976K    /var/lib/pgsql/14/data/pg_stat_tmp
192K    /var/lib/pgsql/14/data/pg_subtrans
0       /var/lib/pgsql/14/data/pg_tblspc
0       /var/lib/pgsql/14/data/pg_twophase
12G     /var/lib/pgsql/14/data/pg_wal
112K    /var/lib/pgsql/14/data/pg_xact

I even tried umounting the device (after shutting down PostgreSQL), but it's busy, even though lsof says that no processes are using it:

$ export PS1="$ "
$ sudo lsof /var/lib/pgsql/14
$ 

Added lsof +L1 output:

$ sudo lsof +L1
COMMAND     PID    USER   FD   TYPE DEVICE SIZE/OFF NLINK      NODE NAME
dbus-daem  1182    dbus   22r   REG    8,2 11567160     0 135113565 /var/lib/sss/mc/initgroups (deleted)
polkitd    1887 polkitd    3r   REG    8,2 11567160     0 134903325 /var/lib/sss/mc/initgroups (deleted)
python.or  1899    root    6w   REG    8,2     1434     0  67157239 /var/log/venv-salt-minion.log-20231217 (deleted)
python.or  1899    root    7r   REG    8,2 11567160     0 134903325 /var/lib/sss/mc/initgroups (deleted)
sssd      16382    root   14r   REG    8,2 11567160     0 134903325 /var/lib/sss/mc/initgroups (deleted)
sssd_be   16383    root   18r   REG    8,2 11567160     0 134903325 /var/lib/sss/mc/initgroups (deleted)
RonJohn
  • 1,148

2 Answers2

32

You have another filesystem underneath /var/lib/pgsql/14. Unmount /var/lib/pgsql/14/backups and check for unexpected files written there:

Filesystem                                 Type  Size  Used Avail Use% Mounted on
/dev/mapper/pgsql14vg-pgsql141v            xfs   5.4T  1.1T  4.4T  20% /var/lib/pgsql/14
FISPFILNAS01.xxxxxxxxxxxx:/DB_backups_TAPb nfs4   15T  2.3T   13T  16% /var/lib/pgsql/14/backups

If you cannot unmount the lower filesystem (backups) because it's a production environment, you can temporarily "bind mount" the upper one with the missing space per the instructions at du results on filesystem inconsistent with df:

mkdir /mnt/pgsql14
mount --bind /var/lib/pgsql/14 /mnt/pgsql14
du -x -d2 -h /mnt/pgsql14 | sort -k2

umount /mnt/pgsql14 rmdir /mnt/pgsql14

Chris Davies
  • 116,213
  • 16
  • 160
  • 287
  • 7
    That was the problem. I thought I'd cleaned out the old files from backups/, but obviously I hadn't. – RonJohn Jan 07 '24 at 15:19
  • So you mounted a filesystem over a non-empty directory, effectively hiding those files at the mount point? – U. Windl Jan 10 '24 at 07:07
  • @U.Windl yes. I've seen similar elsewhere, for example if backup fails to mount but a backups process runs anyway – Chris Davies Jan 10 '24 at 08:39
11

Run sudo lsof +L1.

You'll see all deleted-but-still-open files, and the processes holding them open.

telcoM
  • 96,466