17

I'm running Linux Mint 14 Nadia. The Linux partition has 10G. When the system starts, du reports 80% usage. Then the usage slowly grows until it reaches 100% and the system becomes unusable. (It can happen on the order of days or weeks). After the reboot the usage resets to 80%.

The strangest thing of all is that du shows no change.

Here's output of those commands (Windows and external drive partitions are elided):

# --- Just after reboot ---

$ df -h     
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       9.8G  7.3G  2.0G  80% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
udev            428M  292K  428M   1% /dev
tmpfs            88M  1.3M   87M   2% /run
none            5.0M     0  5.0M   0% /run/lock
none            437M  288K  437M   1% /run/shm
none            100M   12K  100M   1% /run/user

$ sudo du -x   -d1 -h /
186M    /opt
512M    /var
11M /sbin
556K    /root
1.3G    /home
613M    /lib
8.0K    /media
4.6G    /usr
16K /lost+found
111M    /boot
39M /etc
4.0K    /mnt
60K /tmp
9.1M    /bin
4.0K    /srv
7.3G    /            # <-- note this


# --- After some time ---

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       9.8G  9.1G  199M  98% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
udev            428M  292K  428M   1% /dev
tmpfs            88M  1.3M   87M   2% /run
none            5.0M     0  5.0M   0% /run/lock
none            437M   27M  411M   7% /run/shm
none            100M   28K  100M   1% /run/user

$  sudo du -x   -d1 -h /
186M    /opt
511M    /var
11M /sbin
556K    /root
1.4G    /home
613M    /lib
8.0K    /media
4.6G    /usr
16K /lost+found
111M    /boot
39M /etc
4.0K    /mnt
520K    /tmp
9.1M    /bin
4.0K    /srv
7.3G    /              # <-- note this

(Note: I use hibernation. After the hibernation, the usage stays the same, and after reboot, it resets to 80%.)

How do I track what eats the space?

I've read this question. I'm still in the dark. How do I find out which program is responsible for this behavior?

After edit: found it. The space is claimed by the kernel log, which is seen by dmesg. It fills up because my machine generates errors at the rate 5 a second. (It's related to this bug.) Let the future readers with a similar problem - slowly-filling disk space unseen by du - not forget to try dmesg in searching for the cause.

Arry
  • 279

8 Answers8

23

Repeated execution of

sudo du -x   -d1 -h /

(down the directory tree) should tell you where the space is consumed. That probably explains without further investigation which application is causing that.

invisible files

If du doesn't show these files then one of the possibilities are deleted files. A file (or rather: its name i.e. its entry in a directory) can be deleted while the file is still in use. As long as there is a valid file descriptor pointing at this file it covers space on the volume (if it is not an empty file...).

cat >file &
ls -l file
rm file
ls -l file
# PID of cat is 19834
ls -l /proc/19834/fd
lrwx------ 1 hl hauke 64 11. Feb 19:16 0 -> /dev/pts/0
l-wx------ 1 hl hauke 64 11. Feb 19:16 1 -> /crypto/home/hl/tmp/file (deleted)
lrwx------ 1 hl hauke 64 11. Feb 19:15 2 -> /dev/pts/0

You can find these files with find:

find /proc/ -mindepth 3 -maxdepth 3 \
-regex '/proc/[1-9][0-9]*/fd/[1-9][0-9]*' -type l -lname '*(deleted)' \
-printf '%p\n     %l\n' 2>/dev/null

It may be one single huge file or a bunch of smaller files which cause your problem. There are about 30 such files on my system now (belonging to only five processes). ls -l shows the size of these files but it seems not to be possible to get this value from find.

After killing the process the space becomes available to the file system (df) again.

Hauke Laging
  • 90,279
  • This doesn't address my question. du reports no change in consumed space: 7.3G at start and 7.3G after time passes. df reports 7.3G free at start and up to 10G as time passes. I cannot find the problem with du. – Arry Feb 08 '14 at 12:21
  • @Arry Indeed, I read too fast. – Hauke Laging Feb 11 '14 at 19:15
14

Use something like

lsof -s | grep deleted | sort -k 8

to see which processes are keeping deleted files open. The important fields are the second (PID), and the eighth (third from last; the file size).

(Pay attention to duplicated lines, don't count them twice. Check the PID and the file path (last field) or inode number (second to last field).)

After that, if you find a process that is likely the culprit, we can see how to fix it.

angus
  • 12,321
  • 3
  • 45
  • 40
  • Good suggestion, but on my machine this command reports only 2 open deleted files with size of 2k, which is a far cry from 2.7G consumed over time by some stray process. – Arry Feb 08 '14 at 12:24
  • This was a great suggestion and it indeed helped me solve a problem similar to the parent question. I had a huge discrepancy between df and du commands. In my specific case, I have rotating logs and a service that forwards the logs (logstash in this example). The logstash service was keeping the rotated logs open, even when deleted. This was causing the discrepancy between du and df. Once the logstash service was restarted disk space showed up correctly. – aemus Mar 25 '15 at 21:00
  • I had a process writing an append-only-file that grew indefinitely and eventually filled my disk. Then I decided to rm that file but the process didn't close its file descriptor, so it was somehow still being used. Restarting the process and limiting the AOF size solved my problem. – aviggiano Jul 24 '15 at 14:49
  • Worth a note that this requires root privileges. Also, I used sudo lsof -s | grep deleted | sort -hk7 to get a numerical sort. Without -h, sort does funny lexical things with numbers. – Derek Oct 22 '18 at 14:18
  • This is amazing stuff. Up-voted – Techie Mar 27 '20 at 05:20
4
find / -size +10000k -print0 | xargs -0 ls -l -h

Use this to find recursively what is filling more than 10MB+ from /(root) , and display it with lots details with ls -l in xargs. If you write 1000000 (2 extra zeros) you can get 1GB+ for example.

du / -h --max-depth=1 | sort -h

You can also use du, and just dig into it manually.

Kiwy
  • 9,534
Adionditsak
  • 3,935
2

Whenever this happens I always start my focus in certain sub-directories. The FHS structure that most Linux distros adhere to is laid out with this in mind.

First look in /var, followed by /home.

$ sudo du -x -d1 -h /var  | sort -hr`

$ sudo du -x -d1 -h /home | sort -hr`

You can narrow your focus to sub-directories within either of those locations as well. Once you've exhausted looking in there, I usually move to /root, and lastly the remaining sub-directories at /.

If it's a Red Hat based distro the cache that yum uses to perform updates might be consuming a large amount of space. You can use this command to clear it:

$ yum clean packages

Other distros that use apt can do something similar, apt-get clean.

I'd also run this command at the top of your / directory, this location can sometimes become the source for stray log files.

$ ls -la /

Pay special attention to dot files! Things named .blah, for example.

slm
  • 369,824
2

timeshift was eating my disk. To delete a single snapshot:

sudo timeshift --delete  --snapshot '2014-10-12_16-29-08'

To delete all snapshots:

sudo timeshift --delete-all
Ed Harrod
  • 103
-1

I've got an almost same situation.

In my case, the reason was VMware. One of the other VMwares on the same machine, it consumed the disk spaces. That's why my disk space usage was 100%.

After deleting large files from the neighbour's VMware, it's working correctly.

pchero
  • 1
  • 1
-1

Be ware of you NFS mounts

Keep a mind of your mount point if you have NFS/Samba bind mount and you process have something to do with those mount point.

In my case, I use a deluge to download files to a mounted NFS storage, for some reasons, the NFS sometimes failed to mount during reboot and deluge keeps downloading large files to that non-mounted directory, ate up all my free disk space.

When I was checking with du I ignored that directory since it wasn't a local storage, and BIND mounted to a NFS. Finally, I unmount those NFS/Samba share and clean up the mounted directory, everything back to normal.

-3

Running on centOS.

The problem was with docker instances building up log files (nginx).

docker system prune -a -f did not work, Total reclaimed space: 0B

The following

systemctl stop docker.service
rm -rf /var/lib/docker
systemctl start docker.service
reboot

finally freed 1.8 GB (!)