3

Related: What's the best way to count the number of files in a directory?

I have a system with a largish number of files in a directory

 $ ls -god xml
 drwxrwsrwx   7 7070720 Mar 12 11:51 xml

If I try to count specific groups of file using ls xml/*query | wc -l the system usually produces error message

 /bin/ls: arg list too long

I tried find xml -name '*query' | wc -l there was no response after 10 minutes at which time I terminated the command.

$ nohup time find xml -name '*query' -level 0 | wc -l &
[1]     11751

$ ps -f 2>rgb
     UID   PID  PPID  C    STIME     TTY        TIME CMD
     rgb 11751 10637  0 02:45:11  ttyp12    00:00:00 wc -l
     rgb 11752 11751  0 02:45:11  ttyp12    00:00:00 time find xml -name *query -level 0
     rgb 11753 11752 77 02:45:11  ttyp12    00:00:03 find xml -name *query -level 0
     rgb 11776 10637  1 02:45:17  ttyp12    00:00:00 ps -f
     rgb 10583 10581  0 02:30:13  ttyp12    00:00:00 -csh
     rgb 10637 10583  2 02:30:19  ttyp12    00:00:00 ksh

top -Urgb

last pid: 11864;  load averages:  1.21,  0.82,  0.66                   14:48:03
249 processes: 246 sleeping, 2 running, 1 onproc
CPU states:  0.0% idle, 24.5% user, 75.5% system,  0.0% wait,  0.0% sxbrk
Memory: 2048M phys, 1799M max, 1718M free, 1774M locked, 114M unlocked, K swap

  PID USERNAME PRI NICE   SIZE   RES  STATE   TIME  COMMAND
11837 rgb       26    0   804K   804K onpr    0:00  top
11753 rgb       56    4  5512K  5512K run     1:10  find
11751 rgb       51    4   588K   588K sleep   0:00  wc
10583 rgb       48    0  1204K  1204K sleep   0:00  -csh
11752 rgb       48    4   588K   588K sleep   0:00  time
10637 rgb       48    0  1288K  1288K sleep   0:00  ksh

last pid: 12330;  load averages:  1.82,  1.45,  1.05                   14:58:06
258 processes: 253 sleeping, 4 running, 1 onproc
CPU states:  0.0% idle, 20.7% user, 78.7% system,  0.6% wait,  0.0% sxbrk
Memory: 2048M phys, 1799M max, 1711M free, 1774M locked, 106M unlocked, K swap

  PID USERNAME PRI NICE   SIZE   RES  STATE   TIME  COMMAND
11837 rgb       26    0   804K   804K onpr    0:00  top
11753 rgb       -1    4  5512K  5512K run     5:10  find
11751 rgb       51    4   588K   588K sleep   0:00  wc
10583 rgb       48    0  1204K  1204K sleep   0:00  -csh
11752 rgb       48    4   588K   588K sleep   0:00  time
10637 rgb       48    0  1288K  1288K sleep   0:00  ksh

$ jobs
[1] +  Running                 nohup time find xml -name '*query' -level 0 | wc -l &

$ kill %1
[1] + Terminated               nohup time find xml -name '*query' -level 0 | wc -l &

Can I instead the number of files, within say 10%, from the 7070720 size of the directory given by ls -god xml?

Supplementary Q: To what extent does this depend on the filesystem (UFS, V7FS, HTFS etc etc)?


Update:

The command ls xml | wc -l did return a value in a few seconds. I should have tried this before posting the question. This provides the information I was asking for, so there's no point working out how many filename+inode entries fit in a 7070720 byte directory (answer: at least 260085).

  • Depends completely on the filesystem. If you don't specify that and the OS, the answer is "you can't in general". (And even if you do, that's probably going to be the answer anyway.) – Mat Mar 12 '13 at 12:21
  • "some sort of error message" is not quite what I would tell others if they are supposed to help me... There is no reason for find | wc to crash. Neither should consume a lot of resources. So give us the exact error message and have a look at the memory consumption of the processes. You may minimize the data flow by using find -printf . | wc -m – Hauke Laging Mar 12 '13 at 12:52
  • @Hauke: Mea Culpa. Question updated (but Q possibly redundant now). – RedGrittyBrick Mar 12 '13 at 14:41

4 Answers4

4

ls wastes resources by sorting the output. If you have GNU ls, do this instead:

ls --quoting-style=escape -U xml | wc -l
Hauke Laging
  • 90,279
  • Note: Those options are probably GNU ls. I am working on a Unix system with AT&T-style file utils. I can however install GNU ls so the answer applies. – RedGrittyBrick Mar 12 '13 at 15:14
  • 1
    @RedGrittyBrick Be glad you are allowed to install GNU ls! I could name a lot of administrators who do have a root account but who would never install a tool version "from outside"! The actual reason for this is because they are using one of the enterprise server distributions (e. g. RHEL) and they'd invalidate their warranty if they installed a "third-party" tool version (which the warranty would no longer apply to, obviously). And you had better obey to these rules if you run your own business and heavily depend on this stuff to run flawlessly (since it's your means of existence...) – syntaxerror Dec 14 '14 at 14:36
1
n=0; for file in ./*; do let "n += 1"; done; echo $n;
Eric
  • 111
1

Does

set -- *
echo $#

work? Since setis a shell built-in, it might not be affected by the arg limit. Note that this ignores dotfiles. With a more selective glob you may get exacly what you need. The beauty of this is that it doesn't require a single fork or pipe.

Jens
  • 1,752
  • 4
  • 18
  • 36
0

How about this command which pipes the unsorted dir listing and counts matches:

$ ls -U ./xml/ | grep -c 'query$'

Note that depending on your flavor of *nix, you might want ls -u instead of ls -U

  • 1
    Never parse ls, this can give wrong results. – Chris Down Sep 10 '13 at 23:47
  • 2
    interesting, but that page starts off with a preposterous case of filenames containing newlines or other odd chars, (which is solved by GNU ls --quoting-style=c and BSD ls -q ) and doesn't address the question of how to count number of files in a subdirectory containing a lot of files. – Mark Hudson Sep 13 '13 at 22:04