-1

A directory I have is filled by a lot of files. I want to discover of what kind they are and whose of them are so numerous.

Here are the events when I try some commands :

ls -l | wc -l
1514340

ls | head -n 4
2004112700001.htm
2004112700002.htm
2004112700003.htm
2004112700004.htm

ls *.xml | head -n 4
20041127.xml
20041225.xml
20050101.xml
20050108.xml

ls -l *.htm | wc -l
bash: /bin/ls: Liste d'arguments trop longue
0

# Any other kind of ls command with *.htm, *.* is failing too.

I understand that wc -l has to wait that the output of the ls -l *.htm is entirely done before starting to analyze it. And because that output is too big, it fails.

Is it truly what is happening ?

What is the good way to make the ls command works in this case in conjunction with wc -l ? Is there a way to ask the wc command to start asynchronously, before the output is entirely completed ?

  • 1
    It's not wc failing because the output is too big or the pipe that's overflowing. ls is notg even starting because *.htm expands into too many arguments for it. – muru Jun 04 '20 at 06:29
  • @muru : how can it be ? there is no other file extensions starting with htm than htm. No html file, for example. – Marc Le Bihan Jun 04 '20 at 07:06
  • So what? *.htm expands to 2004112700001.htm 2004112700002.htm 2004112700003.htm 2004112700004.htm ... then ls is run with all those filenames as arguments, which exceeds the argument length limit. Whether or not you have a .html file makes no difference. Please see the dupe. – muru Jun 04 '20 at 07:08
  • @muru *.htm isn't the arg[0] that a C program ls is taking to resolve a file filter by classical findFirst, findNext functions ? How would the ls succeed in expanding *.htm to a list of files ? By doing itself an ls ? – Marc Le Bihan Jun 04 '20 at 07:17
  • Never heard of these classical functions. ls doesn't expand anything. The shell does. See, e.g,, https://unix.stackexchange.com/q/17938/70524 – muru Jun 04 '20 at 07:28

1 Answers1

2

Same problem when you try removing millions of files with rm * in a directory. I think the system is "extending" your command with all the filenames it finds... and can't afford it.

I would suggest using "find" instead, like

find . -mindepth 1 -maxdepth 1 -name "*.html" | wc -l
darxmurf
  • 1,132
  • Note that it also counts hidden ones, doesn't work properly if filenames contain newline characters and with many find implementations would skip the filenames that contain sequences of byte that don't form valid characters in the locale. – Stéphane Chazelas Jun 04 '20 at 06:36
  • Well, yes, but if you rack up 1 billion files, spaces in names and exotic characters, it makes things a bit complicated then :-) – darxmurf Jun 04 '20 at 06:47
  • Here, you could do count() { echo "$#"; }; count *.html which wouldn't have either of those problems (but give you 1 instead of 0/error when there's no matching file unless you turn on nullglob/failglob). With find, that could be addressed with LC_ALL=C find . ! -name . -prune -name '*.html' ! -name '.*' -print | LC_ALL=C grep -c / (here also avoiding the -m??depth GNU extensions). – Stéphane Chazelas Jun 04 '20 at 06:51
  • You command works if I search for htm files, and it returns the number of : 1513532 files. Considering the total number of files : 1514340 I had, and 807 of xml kind, there's only one last file having an extension I yet don't know. Therefore, I can't understand really why ls refused my command if it isn't for a kind of buffer overflow. Because it can't be the number of arguments. Only three types of files are in my directory : .xml, .htm, and a last one I don't know, but it's a single file. – Marc Le Bihan Jun 04 '20 at 07:13
  • Have a look here: https://unix.stackexchange.com/questions/38955/argument-list-too-long-for-ls – darxmurf Jun 04 '20 at 07:14