Cheapest way to find the oldest file (recursively) from a directory under heavy load

Question

What is the least expensive way to find the oldest file in a directory, including all directories underneath. Assume directory is backed by SAN and under heavy load.

There is concern that "ls" could be locking and cause system degradation under heavy load.

Edit: Find performs very well under a simple test case - find oldest file amongst 400 gigs of files on an SSD drive took 1/20 seconds. But this is a MacBook Pro Laptop under no load... So it's a bit of an apples to oranges test case.

And as an aside what is the best way to find out implementations (underlying algorithms) for such commands?

possible duplicate of How to list files that were changed in a certain range of time? — jasonwryan, Jul 17 '13 at 22:12
@jasonwryan No, knowing what files were modified in a given time range doesn't help to find the oldest file. — Gilles 'SO- stop being evil', Jul 17 '13 at 23:54
ls doesn't scan the file contents. It reads the directories and stats the files, which is necessary to find the oldest files anyway. But ls won't really help you because going from any ls output to finding the oldest files would be very difficult. — Gilles 'SO- stop being evil', Jul 17 '13 at 23:55
I am surprised nobody mentioned using an event driven model to accomplish this... ie building something that uses inotify. — Robert Christian, Jul 19 '13 at 17:40

score 2 · Answer 1 · answered Jul 17 '13 at 22:02

2

With zsh:

oldest=(**/*(.DOm[1]))

For the oldest regular file (zsh time resolution is to the second)

With GNU tools:

(export LC_ALL=C
 find . -type f -printf '%T@\t%p\0' |
   sort -zg | tr '\0\n' '\n\0' | head -n 1 |
   cut -f2- | tr '\0' '\n')

answered Jul 17 '13 at 22:02

Stéphane Chazelas

544,893

+1 Perhaps add something about backgrounding this and niceing it as the user seems worried about it locking and affecting performance under heavy load. – Joseph R. Jul 17 '13 at 22:05
2

@JosephR. niceing? Useless, this is IO-bound. ionice, maybe. – Gilles 'SO- stop being evil' Jul 17 '13 at 23:54
@Gilles Please correct me if I'm wrong. Wouldn't both nice and ionice be relevant here? The find would take up CPU and can therefore benefit from nice and the rm would need lots of I/O and would therefore benefit from ionice. – Joseph R. Jul 18 '13 at 10:00
2

@JosephR. find is IO-bound just like RM. The CPU time needed to format the data is negligible compared to the stat calls. – Gilles 'SO- stop being evil' Jul 18 '13 at 10:03

score 0 · Answer 2 · answered Jul 18 '13 at 07:13

To minimize the number of external processes, you may be able to optimize by running a custom script instead of a proper find. The directory traversal and stat() of each file cannot be optimized away, but you only need to keep the oldest file so far in memory.

Here is an attempt in Perl:

find2perl -eval 'BEGIN { our ($filename, $oldest); }
    my @s=stat(_); if (! defined $::oldest || $s[9] < $::oldest) {
        $::oldest=$s[9]; $::filename = $File::Find::name }
    END { print "$::filename\n" }' | perl

In my tests, on a moderately large directory (129019 nodes), this is actually about 50% slower than @StephaneChazelas "GNU Tools" version, but you may find that it works better in some scenarios, especially for really large directories.

If you prefer Python, http://stackoverflow.com/questions/7541863/python-equivalent-of-find2perl has some hints. — tripleee, Jul 18 '13 at 07:19

Cheapest way to find the oldest file (recursively) from a directory under heavy load

2 Answers2

Linked