1

Reading the accetped answer of the question locate vs find: usage, pros and cons of each other which tells that locate's main advantage is speed I wanted to do some testing, whether I can benefit from its usage.

My first step was to estimate the speed of find tool when providing a comparable service as locate (hence only searching filenames, no extras).

I was surprised to see that

time find / 2>/dev/null >/dev/null

which I assumed iterates over all files (depended on the users permissions), showed

real    0m1.231s
user    0m0.353s
sys 0m0.867s

a rather quick result.

My question is if the applied command is a way to actually benchmark the speed of find?

An aspect of the question I would be interested to have answered would be if there are some sort of buffers in the filesystem, hence in the OS (which is a linux kernel), which would impact the result?

My results where that droping the caches via echo 3 > /proc/sys/vm/drop_caches, vastly increased the speed of find:

$ sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"
$ time (find / 2>/dev/null >/dev/null)
real    0m24.290s
user    0m1.143s
sys 0m8.230s

Yet on my linux system subsequent usage of find returned to similar to mlocate speeds of about 1sec?

Summed up, I am interested to know a way to benchmark the find command (to compare with locate)

Update/Remark

While the question was motivated by another one comparing locate with find and I ask about measuring/benchmarking the speed of find I am aware that is is highly unlikely that data gathering from the live OS/filesystem (i.e. find) , would be faster then a lookup within a database lookup (i.e. locate). With the rather good caching behaviour of my operation systems kernel I nonetheless had rather similar execution times for searching via find or locate.

The question hence boils down to whether it is enough to drop the operation systems (filesystem) caches, to simulate the "actual" time needed for a find done at a cold start and furthermore how realistic it would be to assume that those speed enhancing caches persists (not unsimilar to the updatedb locate database file) for all subsequent find calls.

Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
  • I'm sorry to say that your benchmark makes no sense at all. Even on a RAM disk with a minimal installation, it would take more than a second to traverse the full / partition. I suspect you get permission errors that stop it from doing much. – Julie Pelletier Jan 05 '17 at 15:57
  • @JuliePelletier considering your comment, should I understand that I need another command to benchmark the speed of find or that you intent to hint that there is no possible way for such a benchmark? Both would make a good starting point for an answer – humanityANDpeace Jan 05 '17 at 16:00
  • My comment implies that your test is apparently bad and you should redirect the output to a file instead of /dev/null to see what's actually going on. Expect permission errors as I mentioned, which are most likely since you are apparently no logged in as root. – Julie Pelletier Jan 05 '17 at 16:03
  • I don''t know how to measure the speed of both commands, but by using the command straceyou will be able to see all the function calls of them, maybe it is useful for you. Take a look to the strace man. – Zumo de Vidrio Jan 05 '17 at 16:08
  • @JuliePelletier I have not considered permission errors to be a problem, but rather a valid part of the benchmark, in which after all ideally everything in / is found and piped the output /dev/null/ as I did not want to cause distortion (i.e. a measuring of output buffer handling performance) when focus should have been the iteration from find. I will take your adivice though and check if permission errors are creating sort of unpredictable results – humanityANDpeace Jan 05 '17 at 16:14

1 Answers1

2

On OpenBSD, the locate database is by default rebuilt once a week by the /etc/weekly script invoking /usr/libexec/locate.updatedb as user nobody.

The locate.updatedb utility is a /bin/sh script (pdksh on OpenBSD) that more or less runs find on the root filesystems. Anything that nobody can access is put into the locate database.

I find it hard to believe that find / would be quicker than locate on a system where locate uses a database of files that has been created through find /.

The difference is, of course, that you may find more files by running find as a user that has further access than the nobody user.

On Linux, at least on the Ubuntu machine that I have access to at work, the locate database seems to be recreated on a daily basis, according to the locate(8) manual. This is done through the updatedb utility.

This utility (a symbolic link to /usr/bin/updatedb.mlocate on this machine) is a compiled binary belonging to the package mlocate.

You can have a look at the sources of mlocate if you wish, but it's basically a C program that traverses the file system. mlocate also tries to avoid traversing bits of the file system that hasn't changed between runs.

Again, I find it hard to believe that querying the mlocate database would be slower (under any circumstances) than running find /.

At the end of the day, this is why all locate tools (that I know about) work against a database.

Kusalananda
  • 333,661
  • on my linux box (4.8.xx build) I found that with caches (as indicated in the question) any subsequent find call was surprisingly quick, I did not indent to imply ant believe that find could beat locate, I was simply surprised that linux kernels caching did such a good job that find did not even take twice as long. – humanityANDpeace Jan 05 '17 at 18:34
  • @humanityANDpeace To benchmark find /, you would have to clear the cache between runs. I suppose the easiest way to do that would be to reboot. – Kusalananda Jan 05 '17 at 18:36
  • In the question I guessed that sudo bash -c "echo 3 > /proc/sys/vm/drop_caches" would be sufficient. I will test it though first thing next reboot :) – humanityANDpeace Jan 05 '17 at 18:52