I had a directory which had around 5 million files. When I tried to run the ls
command from inside this directory, my system consumed a huge amount of memory and it hung after sometime. Is there an efficient way to list the files other than using the ls
command?

- 9,771

- 39,297
4 Answers
Avoid sorting by using:
ls --sort=none # "do not sort; list entries in directory order"
Or, equivalently:
ls -U

- 15,736
- 4
- 55
- 65

- 90,279
-
12I wonder how much overhead the column layout adds, too. Adding the
-1
flag could help. – Mikel Mar 18 '14 at 14:03 -
-
1@Mikel Is that just a guess, or have you measured that? To me it seems that
-1
takes even longer. – Hauke Laging Mar 18 '14 at 19:32 -
19"-1" helps quite a bit. "ls -f -1" will avoid any stat calls and print everything immediately. The column output (which is the default when sending to a terminal) makes it buffer everything first. On my system, using btrfs in a directory with 8 million files (as created by "seq 1 8000000 | xargs touch"), "time ls -f -1 | wc -l" takes under 5 seconds, while "time ls -f -C | wc -l" takes over 30 seconds. – Scott Lamb Dec 15 '15 at 16:08
-
@ScottLamb - Your two commands are not a good comparison, because
-C
forces columns - it overrides the default-1
that is done when piping the output ofls
. AFAIK,time ls -f | wc -l
will run just as fast as the-1
version. Nevertheless I upvoted your comment, because when displaying straight to terminal, it is useful: immediately start seeing some filenames. – ToolmakerSteve Apr 01 '19 at 14:15 -
1@ToolmakerSteve The default behavior (
-C
when stdout is a terminal,-1
when it's a pipe) is confusing. When you're experimenting and measuring, you flip between seeing the output (to ensure the command is doing what you expect) and suppressing it (to avoid the confounding factor of the terminal application's throughput). Better to use commands that behave in the same way in both modes, so explicitly define the output format via-1
,-C
,-l
, etc. – Scott Lamb Apr 01 '19 at 16:17 -
@ScottLamb - I understand. I realized later that you were deliberately doing the test with
| wc -l
as a convenience for timing, but that you were doing so to discuss the underlying performance with or without the pipe (without =>-C
is the default behavior as you say); you were showing that if columns were being formed, the command was much slower for many files. Thank you. – ToolmakerSteve Apr 03 '19 at 11:24 -
-
-
@ScottLamb's
ls -f -1
command resulted in actually seeing output instead of either waiting for a very long time or getting an out of memory error. thanks! – Christian Apr 10 '23 at 20:00
ls
actually sorts the files and tries to list them which becomes a huge overhead if we are trying to list more than a million files inside a directory. As mentioned in this link, we can use strace
or find
to list the files. However, those options also seemed unfeasible to my problem since I had 5 million files. After some bit of googling, I found that if we list the directories using getdents()
, it is supposed to be faster, because ls
, find
and Python
libraries use readdir()
which is slower but uses getdents()
underneath.
We can find the C code to list the files using getdents()
from here:
/*
* List directories using getdents() because ls, find and Python libraries
* use readdir() which is slower (but uses getdents() underneath.
*
* Compile with
* ]$ gcc getdents.c -o getdents
*/
#define _GNU_SOURCE
#include <dirent.h> /* Defines DT_* constants */
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#define handle_error(msg) \
do { perror(msg); exit(EXIT_FAILURE); } while (0)
struct linux_dirent {
long d_ino;
off_t d_off;
unsigned short d_reclen;
char d_name[];
};
#define BUF_SIZE 1024*1024*5
int
main(int argc, char *argv[])
{
int fd, nread;
char buf[BUF_SIZE];
struct linux_dirent *d;
int bpos;
char d_type;
fd = open(argc > 1 ? argv[1] : ".", O_RDONLY | O_DIRECTORY);
if (fd == -1)
handle_error("open");
for ( ; ; ) {
nread = syscall(SYS_getdents, fd, buf, BUF_SIZE);
if (nread == -1)
handle_error("getdents");
if (nread == 0)
break;
for (bpos = 0; bpos < nread;) {
d = (struct linux_dirent *) (buf + bpos);
d_type = *(buf + bpos + d->d_reclen - 1);
if( d->d_ino != 0 && d_type == DT_REG ) {
printf("%s\n", (char *)d->d_name );
}
bpos += d->d_reclen;
}
}
exit(EXIT_SUCCESS);
}
Copy the C program above into directory in which the files need to be listed. Then execute the below commands.
gcc getdents.c -o getdents
./getdents
Timings example: getdents
can be much faster than ls -f
, depending on the system configuration. Here are some timings demonstrating a 40x speed increase for listing a directory containing about 500k files over an NFS mount in a compute cluster. Each command was run 10 times in immediate succession, first getdents
, then ls -f
. The first run is significantly slower than all others, probably due to NFS caching page faults. (Aside: over this mount, the d_type
field is unreliable, in the sense that many files appear as "unknown" type.)
command: getdents $bigdir
usr:0.08 sys:0.96 wall:280.79 CPU:0%
usr:0.06 sys:0.18 wall:0.25 CPU:97%
usr:0.05 sys:0.16 wall:0.21 CPU:99%
usr:0.04 sys:0.18 wall:0.23 CPU:98%
usr:0.05 sys:0.20 wall:0.26 CPU:99%
usr:0.04 sys:0.18 wall:0.22 CPU:99%
usr:0.04 sys:0.17 wall:0.22 CPU:99%
usr:0.04 sys:0.20 wall:0.25 CPU:99%
usr:0.06 sys:0.18 wall:0.25 CPU:98%
usr:0.06 sys:0.18 wall:0.25 CPU:98%
command: /bin/ls -f $bigdir
usr:0.53 sys:8.39 wall:8.97 CPU:99%
usr:0.53 sys:7.65 wall:8.20 CPU:99%
usr:0.44 sys:7.91 wall:8.36 CPU:99%
usr:0.50 sys:8.00 wall:8.51 CPU:100%
usr:0.41 sys:7.73 wall:8.15 CPU:99%
usr:0.47 sys:8.84 wall:9.32 CPU:99%
usr:0.57 sys:9.78 wall:10.36 CPU:99%
usr:0.53 sys:10.75 wall:11.29 CPU:99%
usr:0.46 sys:8.76 wall:9.25 CPU:99%
usr:0.50 sys:8.58 wall:9.13 CPU:99%
-
15Could you add a small benchmark in timing for which your case does display with
ls
? – Bernhard Mar 17 '14 at 16:34 -
1Sweet. And you could add an option to simply count the entries (files) rather than listing their names (saving millions of calls to printf, for this listing). – ChuckCottrill Mar 17 '14 at 20:50
-
Since the directory has millions of files, you could use puts((char*)d->d_name) rather than printf, so save some processing -- see: http://bytes.com/topic/c/answers/527094-puts-vs-printf – ChuckCottrill Mar 17 '14 at 21:26
-
34You know your directory is too big when you have to write custom code to list its contents... – casey Mar 18 '14 at 01:45
-
1@casey Except you don't have to. All this talk about
getdents
vsreaddir
misses the point. – Mikel Mar 18 '14 at 13:35 -
11Come on! It's already got 5 million files in there. Put your custom "ls" program into some other directory. – Johan Mar 19 '14 at 08:35
-
@ChuckCottrill Because of the way piping works, that's not really necessary. You could just
./getdents /my/huge/dir | wc -l
and it will still be pretty fast. That's because you are giving the output ofgetdents
towc
instead of thestdout
(terminal in most cases) – anu Nov 04 '16 at 18:58 -
1Not a C programmer...any chance somebody can update this so it has glob support? I'm currently piping results through grep, but that seems awfully suboptimal. – mlissner May 04 '17 at 17:46
-
Is it expected that this would do very weird things in an sshfs-mounted directory? I'm getting back a fraction of the results I expect. – mlissner May 05 '17 at 22:48
-
@Ramesh Good one !! Is there any way to get the file properties like date modified , size etc . – Joby Wilson Mathews Nov 03 '17 at 10:21
The most likely reason why it is slow is file type colouring, you can avoid this with \ls
or /bin/ls
turning off the colour options.
If you really have so many files in a dir, using find
instead is also a good option.

- 315
-
7I don't think this should have been downvoted. Sorting is one problem, but even without sorting,
ls -U --color
would take a long time since it wouldstat
each file. So both are correct. – Mikel Mar 18 '14 at 13:59 -
Turning coloring off has a huge impact on the performance of
ls
and it is aliased by default in many many.bashrc
s out there. – Victor Schröder Aug 07 '18 at 13:59 -
Yup I did a
/bin/ls -U
and got output in no time, compared to waiting for a very long time before – khebbie Oct 11 '19 at 07:03 -
1
I find that echo *
works much faster than ls. YMMV.

- 1,710
-
4The shell will sort the
*
. So this way is probably still very slow for 5 million files. – Mikel Mar 18 '14 at 13:52 -
4@Mikel More than that, I'm pretty sure that 5 million files is over the point where globbing will break entirely. – evilsoup Mar 18 '14 at 15:26
-
5Minimum file name length (for 5 million files) is 3 characters (maybe 4 if you stick to more common characters) plus delimiters = 4 chars per file, i.e 20 MB of command arguments. That is well over the common 2MB expanded command line length. Exec (and even the builtins) would baulk. – Johan Mar 19 '14 at 08:49
ls
that uses--color
or-F
as that would mean doing alstat(2)
for each file. – Stéphane Chazelas Mar 17 '14 at 21:37ls
call or did you use options? – Hauke Laging Mar 18 '14 at 03:27