1

I am using the ls command in bash, trying to find all files or directory of length n. Let's say n=5

My command is:

ls ?????

But this would also include characters that are non letters such as period. For example, the following files would match:

ab.cd    
abd.c

I only want to match files that have 5 letter or number names:

five1
five2    
five3

But not

abc.d    
ab.cd    
a.bcd

How can I modify my command?

Answer found:

ls [a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]

I found the answer but how can I make this less ugly?

Mat
  • 52,586
Saad A
  • 189
  • 5
    (1) Why do you “have to use ls … instead of regex”?  If this is an actual constraint, you should [edit] it into the question rather than just mention it in a comment.  (2) If n=5, why are you repeating the [a-zA-Z0-9] regex seven times? – G-Man Says 'Reinstate Monica' Feb 15 '16 at 22:17
  • Do you simply want to exclude extensions like .txt, .out, .js, and five-char long ones like .pages? Are you really only interested in alphanumeric chars, or would q_pdf.tgz and f-193.zip also be files you're looking for? – Ryder Feb 16 '16 at 09:19
  • I can only use ls for an assignment – Saad A Apr 22 '16 at 14:49

7 Answers7

10

Note that it's not ls that interprets those globs. Those globs are expanded by your shell into a list of file names that is passed as arguments to ls. Different shells have different globbing capabilities. bash has a few extensions over standard globs (borrowed from ksh88 and enabled with shopt -s extglob) but is still limited compared to shells like zsh or ksh93.

With zsh:

setopt extendedglob
ls -d [[:alnum:]](#c5)

ksh93:

ls -d {5}([[:alnum:]])

or:

ls -d {5}(\w) # (\w includes underscore in addition to alnums)

or, if you wanted to use extended regular expressions:

ls -d ~(E)^[[:alnum:]]{5}$

With bash or other POSIX shells which don't have equivalent globbing operators, you'd need to do:

ls -d [[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]]

Note that [[:alnum:]] includes any alphabetic character in the current locale (not only latin alphabets let alone the English one) and 0123456789 (and possibly other types of digits). If you want the letters in the English alphabet, name characters individually:

c='[0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ]'
unset -v IFS
ls -d $c$c$c$c$c

Or use the C locale:

(export LC_ALL=C
ls -d [[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]])
  • Why name them individually? bash supports ranges, which is what the OP used in the first place. – RealSkeptic Feb 16 '16 at 20:18
  • @RealSkeptic, [a-z] matches the 26 English letters only in the C locale. In other locales, it matches all sorts of things, it may even match sequences of characters (though that doesn't seem to happen for bash). Generally [a-z] matches the collating elements that sort between a and z. That would include things like á, œ, possibly ch, ý, but not ź... See What does "LC_ALL=C" do? for more details. – Stéphane Chazelas Feb 16 '16 at 21:48
  • i guess that is as much as i can do with ls in bash – Saad A Apr 22 '16 at 14:53
  • This drives me crazy!  You say that [[:digit:]] will match only 0123456789, but [[:alnum:]] will match 0123456789 *(and possibly other types of digits)!*  So there are characters that are matched by [[:alnum:]] without being matched by either [[:alpha:]] or [[:digit:]]?  The document that you referred me to two weeks ago says “isalnum … is specified as the union of isalpha and isdigit in IS C.” – G-Man Says 'Reinstate Monica' May 29 '19 at 03:48
  • @G-Man, yes see how on GNU system [[:alpha:]] matches some non-ASCII digits. Yes, I agree it's a mess. – Stéphane Chazelas May 29 '19 at 06:18
7
ls -q | grep -Ex '[[:alnum:]]{5}'
don_crissti
  • 82,805
Barefoot IO
  • 1,946
  • 5
    Parsing the output of ls is not a great idea: see Why you shouldn’t parse the output of ls(1).  However, the only real problem I can see with this answer is that it may give misleading results in the presence of files whose names include newline(s). – G-Man Says 'Reinstate Monica' Feb 15 '16 at 22:10
  • 1
    @G-Man - not really, see my edit. – don_crissti Feb 15 '16 at 22:25
  • It should be [[:alpha:]], letters only. – vonbrand Feb 16 '16 at 00:19
  • 3
    @vonbrand - why ? OP wants a solution that lists "letter or number names". – don_crissti Feb 16 '16 at 00:21
  • @G-Man, the idea is that with don_crissti's edit, a newline character would be changed to ? (at least when the output is not a terminal) and neither ? nor newline are matched by [[:alnum:]] so in this particular case, it's going to work, at least with POSIX compliant implementations of ls (not busybox which doesn't have -q, not ast-open's which doesn't work properly for newline) – Stéphane Chazelas May 28 '19 at 10:13
  • @StéphaneChazelas: I don’t really remember this, but I suspect that I probably understood it three years ago.  Why are you explaining it to me now? … … This is the second time in two weeks that you’ve pinged a comment to me on a thread that had been inactive for over a year.  Are you farming some badge? – G-Man Says 'Reinstate Monica' May 29 '19 at 03:47
  • @don_crissti: In vonbrand’s defense, the question is internally inconsistent.  The title says “n letter filename”, and the body expresses dissatisfaction with the ? glob because “this would also include characters that are non letters …”. – G-Man Says 'Reinstate Monica' May 29 '19 at 03:47
  • 1
    @G-Man It's just that your comment is garnering upvotes, but don_crissti's comment doesn't really explain why it's OK. My comment adds that clarification. Nothing to do with you specifically or any form of farming. – Stéphane Chazelas May 29 '19 at 09:33
5

If you're going to use ls in the way you mentioned, you should use the -d option; otherwise, it will list the contents of any directories whose names are five letters and/or digits, rather than listing the names themselves.  Also, you can do

ls -d [[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]]

but that's exactly as much typing as the answer you have now.

Also, if you don't need to use ls in your command, you could use

echo [[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]]

which will list all the matches on one line, or

printf "%s\n" [[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]][:alnum:]]

which will list them on separate lines.

4

I don't know if you'll count this as "less uglier" but you could use GNU find like so:

 find -maxdepth 1 -regextype posix-extended -regex './[[:alnum:]]{5}'

though that will put ./ in front of each entry, though you could sed that away

Depending on what you do next find may also have the advantage of helping to avoid the Parsing ls troubles

Keeping it with extglobs (shopt -s extglob) you could rewrite it a little, though again I don't know if you count it as cleaner

ls [[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]]
Eric Renouf
  • 18,431
  • I have to use ls which uses glob or extended glob instead of regex. – Saad A Feb 15 '16 at 20:57
  • @SaadA I don't think you'll be able to do too much then, I made one more suggestion, but it's not really much different from what you have – Eric Renouf Feb 15 '16 at 21:07
  • 4
    @SaadA, ls doesn't use globs, it's the shell that expands the globs and passes the resulting list to ls. – Stéphane Chazelas Feb 15 '16 at 22:51
  • 1
    @EricRenouf: The OP wants to find "all files *or directory* of length n", so -type f isn't really appropriate. – G-Man Says 'Reinstate Monica' Feb 16 '16 at 01:26
  • @G-Man good point, thanks for the catch – Eric Renouf Feb 16 '16 at 01:58
  • find * -maxdepth 0 -regex '[[:alnum:]]\{5\}' – muhmuhten Feb 16 '16 at 18:37
  • 1
    @muhmuhten (1) find * is rarely useful, since find already enumerates directories; using another directory-enumerating mechanism (pathname expansion, a.k.a. wildcards/globs) in the same command just complicates matters.  I guess I see why you're doing it — to avoid the ./ — but it would have been nice if you had explained that yourself.  (Comments can be up to 600 characters long.)  (2) Your command doesn't work on my system — I need a -regextype to go with that -regex. – G-Man Says 'Reinstate Monica' Feb 16 '16 at 19:28
2

Build up the expression in a variable:

 e=""; for m in $(seq 1 5); do e="$e[A-Za-z0-9]"; done

Put it in a function to be fancy/reusable:

 alnumglob() {
    local e=""
    for m in $(seq 1 $1) ; do e="$e[A-Za-z0-9]"; done
    echo "$e"
 }

 ls -ld $(alnumglob 5)
Otheus
  • 6,138
1

If you want to safely determine the length of a string with bash, you should use parameter expansion. ls alone is only capable of glob syntax, which (likely) cannot do what you want. find has different implementations on BSD and some Linuxes and -regextype isn't necessarily a legal flag. The good news is, bash has loops, and these can give you what you want.

for filename in *              # globs all files in your directory.
do
    clip=${filename%.*}        # excludes the first extension
    if [[ ${#clip} -eq 5 ]]    # test the length of the remaining string
    then
        ls -d $filename        # call ls to show you the file or directory
    fi
done

If instead you need any filename solely with five alphanumeric chars, your method and other answers will only return those files which begin with five chars. For example:

Using ls [a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]

five1.txt      # matches
fi_ve.txt      # fails
a.pages        # fails
q_fiver.txt    # fails

To include any file with any five-character long alphanumeric string, and only those with any five-character long alphanumeric string, you can use grep's more universal regex implementation under bash. Although I wouldn't normally recommend it, the use of the \w and \W here can help readability a great deal (where \w = [[:alnum:]] and \W = [^[:alnum:]] – but they will include underscores, so use at peril).

for filename in *
do
    if (grep -qE '(^|\W)\w{5}($|\W)' <<<"$filename")
    then
        ls -d "$filename"
    fi
done
terdon
  • 242,166
Ryder
  • 284
0

If all you want is to make it less ugly, then you can change [a-zA-Z0-9] into [[:alnum:]], the assign that to a variable:

a='[[:alnum:]]'

And use it five times:

$ ls $a$a$a$a$a             ### important that the vars are not quoted.  
five1 five2 five3