Executive summary:
- Every tool I've tried confirms lots of inodes in use on this ext4 partition
- Every tool I've tried shows me that there are no files on the partition
- It's not files held open and it's not an overlay mount
Long story:
I have an SSD with a single ext4 partition. This drive was being used to continually store video from cameras, in short clips, and a cron job would periodically delete the oldest clips (in a C application, which deleted them by calling remove()
). After a while someone noticed that while there should have been about 5 days' worth of video backed up, there was hardly any, but the drive was almost full.
I took a look and naively tried just removing lost+found
, but the drive was still full. So, I deleted everything (rm -rf *
), but df -i
tells me that 91230 inodes are in use, even though ls
and du
show nothing at all.
e2fsck -fv
found no errors to fix (aside from creating lost+found
again), and dumpe2fs
and tune2fs -l
both agree with df -i
on the number of used inodes. I've tried e2fsck -b
with a couple of the backup super-blocks and it didn't seem to make any difference.
baobab
shows the same used space as df
in the summary view, but when I click on the partition to see where the space is used, it only shows the 4.1kB used by the empty lost+found
directory.
The problem is not that deleted file handles that are still open - nothing is open. I've mounted and unmounted this partition multiple times, and even taken the drive out and put it in a completely different machine.
I know I could just re-format the partition and start fresh, but I would really like to understand what's going on here and whether there's any "proper" way to fix this - I don't care whether it brings the files for those inodes back or it makes them properly deleted so they don't use up all the space.
Edit:
Running dump
creates a backup file roughly equal in size to the used space reported by df
et al. Then running restore
to a different drive created a chain of directories that's clearly wrong (/media/usb0/20150426/10/1_20150426_100125.264/20150426/10/1_20150426_100125.264/
and it continues many levels deep, the same structure repeating), before printing a bunch of lines like:
expected next file 7823361, got 7610674
expected next file 7823361, got 7610675
(second number incrementing - it goes back well beyond my terminal's buffer) before finally:
cannot find directory inode 11
abort? [yn]
Choosing n
results in more "cannot find directory node x", so I aborted.
Giving up and writing this off as a freak file-system corruption which hopefully won't happen again.
debugfs
has no way of listing inodes on a device (however it does appear "easy to implement"). The "solution" there was to callstat
on every possible inode number, to check whether it is used, and if so, for which entry.debug2fs
'sicheck
command could also help you check block-by-block instead. Also note that formatting is a rather "proper" way to handle this situation, since it does exactly what you want: reset the inode tables. – John WH Smith Sep 09 '15 at 09:53fsck
after recreatinglost+found
withmklost+found
? – Gilles 'SO- stop being evil' Sep 09 '15 at 23:00