1

I am trying to populate a txt with all the names of .fits files from a folder with this command:

ls *.fits > output_all.txt

The number of .fits files in the folder is >330k and I get the error message

bash: /usr/bin/ls: Argument list too long

How can I solve this?

Alternatively, it might be possible to avoid at all the creation of the file output_all.txt. I only need it to tell the STILTS tool what .fits files to merge into a large .fits file with this command

stilts tcat in=@output_all.txt out=table_stilts.fits icmd='keepcols "FLUX LOGLAM"'

If you know a way in which to tell STILTS to accept as input a directory, not a file it will solve my problem with ls. Tnx

NeStack
  • 149

2 Answers2

7

In ls *.fits, it's the shell that does all the hard work finding the filenames that end in .fits and don't start with ..

Then it passes that list to ls, which sorts it (again, as shell globs already sort the list before passing to ls) and displays it (in columns or one per line depending on the implementation and whether the output goes to a terminal or not) after having checked that each file exists.

So it's a bit counter-productive especially considering that:

  • you forgot the -- option delimiter, so any filename starting with - would cause problems.
  • you forgot the -d option, so if any file is of type directory, ls would list their contents instead of themselves.
  • as ls is a separate command from the shell (in most shells including bash), it ends up having to be executed in a separate process using the execve() system call and you end-up tripping its limit on the cumulative size of arguments and environment variables.

If you just need to print the list generated by the shell from *.fits, you can use printf instead which is built-in in most shells (and therefore doesn't invoke execve() and its limit):

printf '%s\n' *.fits > output_all.txt

That leaves one problem though:

If *.fits doesn't match any file, in the bash shell, *.fits is left as-is, so printf will end-up printing *.fits<newline>.

While ls would give you an error message about that non-existent *.fits file and leave the output_all.txt empty.

That can be changed with the nullglob option (which bash copied from zsh) which causes *.fits to expand to nothing instead. But then we run into another problem: when not passed any argument beside the format, printf still goes through the format once as if passed empty arguments, so you'd end up with one empty line in output_all.txt.

That one can be worked around with:

shopt -s nullglob
println() {
  [ "$#" -eq 0 ] || printf '%s\n' "$@"
}
println *.fits > output_all.txt

If you can switch to zsh instead of bash, it becomes easier:

print -rC1 -- *.fits(N) > output_all.txt

Where N enables nullglob for that one glob and print -rC1 prints its arguments raw on 1 Column, and importantly here: prints nothing if not passed any argument.

With zsh, you can also restrict the list to regular files only (excluding directories, symlinks, fifos..) using the . glob qualifier (*.fits(N.) for instance), or include hidden files with D (*.fits(ND.))...


Lastly you can also always defer to find to find the files, but if you do need the list to be sorted and hidden files to be excluded, and avoid a ./ prefix, that becomes quickly tedious as well and you'd need GNU extensions. For example, for the equivalent of print -rC1 -- *.fits(N.):

LC_ALL=C find . -maxdepth 1 ! -name '.*' -type f -printf '%P\0' |
  sort -z | tr '\0' '\n' > output_all.txt
3

There is a limit to how many characters are allowed on a single command line. On a modern Linux system, it's about 2 million characters. This may vary on other systems.

It looks as if the filenames of your *.fits files in the current directory won't fit onto a single command line. There are many ways to deal with this, one of the simplest is to use find instead. e.g.

find . -maxdepth 1 -type f -name '*.fits' > output_all.txt

Another option is to use perl. e.g.

perl -e 'print map { "$_\n" if -f $_ } sort glob "*.fits"' > output_all.txt

or:

perl -E 'foreach $f (sort glob "*.fits") { say $f if -f $f }' > output_all.txt

or even:

perl -E 'foreach $f (sort grep { -f } glob "*.fits") { say $f }' > output_all.txt

(there are many ways of doing this in perl).

Note: the find version will list hidden filenames (i.e. those beginning with a .) if any exist in the current dir. The perl version won't. Both versions will only print regular files ending in .fits, not directories, symlinks, named pipes, sockets, or device nodes. The find output is unsorted, filenames are printed in the order they're found in the directory. The perl version is sorted (ascending alpha sort. BTW, perl's built-in sort function is flexible enough to sort by a variety of criteria, including the size or timestamps of a file)

BTW, perl has a File::Find library module that can do recursive searches like find can but with the full power of perl for filtering, sorting, and manipulating any filenames it finds, and then processing those files. File::Find is a core library and is included with perl. If you only need to search for files in a particular directory, perl's glob() function is good enough.


AFAICT from skimming the stilts link you posted, it doesn't seem as if stilts' in=@filename arg can handle NUL-separated input, so if you have any .fits files with newlines embedded in the filename, you'll need to rename those files.

If stilts could handle NUL-separated list of filenames, you could use find's -print0 option or change the \n in the perl script to \0 to generate a NUL-separated list. This may not be relevant to stilts, but it's useful to know if you run into the same problem with other programs that can handle NUL as a separator (many programs have a -z, -Z, and/or -0 option for this).

find . -maxdepth 1 -type f -name '*.fits' -print0 > output_all.nul

or

perl -e 'print map { "$_\0" if -f "$_" } sort glob "*.fits"' > output_all.nul
cas
  • 78,579
  • Thanks! How is your answer different from the suggestion by Bib using ls | grep .fits$ > output_all.txt? Does your command do something different? – NeStack Nov 26 '21 at 16:07
  • @NeStack this will include hidden files (e.g. .foo.fits) and will exclude directories, symlinks and anything else not a file. Also, the ls one would descend into subdirectories and list their contents, which this one does not. Note that both this approach and the ls one would fail if any of your file names have newlines. Of course, given that you are storing the result in a text file, I suspect that isn't a problem for you. – terdon Nov 26 '21 at 16:10