Things to bear in mind with parsing the output of ls -l
:
- the format depends on the locale. The format is only specified by POSIX in the POSIX/C locale and even then it allows some variations (like the amount of spacing between the fields, the width of the first field...). For instance, you can't easily detect portably file names that start with blank characters.
- Many systems allow blanks in user and group names, making parsing the output reliably almost impossible there. Best is to use
ls -n
(to use numeric user ids) instead of ls -l
.
- It's impossible to parse the output of
ls
reliably if the file names may contain newline characters (and newline is allowed in a filename in virtually all POSIX systems) unless you use the -q
option (but then you can't tell the exact file name, just see a quoted representation from which you can't get back to the original file name) or use non-standard options found in some implementations. Though see also the trick below.
- The size field is not provided for all types of files (and the meaning of the size field varies between systems for some types of files). You'd probably want to limit to regular files.
- The above is assuming a POSIX
ls
. Old versions have been known to have different output formats, or missing blanks between fields under some circumstances...
So, with that in mind, provided you can guarantee that file names don't contain newline characters and don't start with blank characters, to list the regular files whose size is strictly less than 1MiB, you could do:
(
export LC_ALL=C
ls -n | awk '
/^-/ && $5 < 1048576 {
gsub(/([^[:blank:]]+[[:blank:]]+){8}/, "")
print
}'
)
Add the -a
option if you want to include hidden files. Add -L
if for symlinks, you want to consider the file they (eventually) resolve to.
As other have said, the correct solution would be to use find
here.
Trick to avoid the newline and leading blank issue.
If instead of ls -n
, we use ls -nd ./*
, we would be able to know where the file name begins (on ./
) and on what line it ends (on the line before the next ./
), so you could do:
(
export LC_ALL=C
ls -nd ./* | awk '
/\// {
selected = (/^-/ && $5 < 1048576)
sub(/.*\//, "./")
}
selected'
)
However note that it won't work if there's a large number of files in the current directory as the ./*
is expanded by the shell, and that could cause the limit on the number of arguments to be reached.
To include hidden files, -a
won't help here, we'd need to tell the shell to expand them. POSIXly, it's a bit unwieldy:
ls -dn ./..?* ./.[!.]* ./*
(which is likely to cause warning messages about missing ./..?*
or ./.[!.]*
files).
ls
– phuclv Mar 15 '17 at 09:30