1

Do anchors only work with grep or can it be used with other commands?

For example:

ls -l ^cat
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
  • 2
    This has a bit of an XY problem smell on it, and doesn't seem a very good question. You're not telling if you want to match filenames based on the strings they start or end with; or if you want to use ^ and $ only and in particular (for some obscure reason); or if you want to match filenames based on regular expression patterns in general (they contain much more than just the ^ and $ anchors). Or in other words, is the point the particular type of pattern, or just matching at start of filename. – ilkkachu Oct 17 '17 at 13:43

5 Answers5

18

Regular expression anchors such as ^ and $ are only parsed by tools which implement regular expressions. ls is not such a tool, and so no, it cannot use them. However any binary invoked from the shell can use shell globbing, which is a simpler, albeit less powerful, wildcard-based search mechanism.

For example, for a list of all files whose names start with cat:

$ ls cat* # lists all files with names which start with 'cat'
$ ls *dog # lists all files with names which end with 'dog'
$ ls d*y  # Lists all files which names which start with 'd' and end
            with 'y', e. g. 'donkey'
$ ls p?g  # Lists all files which start with 'p', have one additional
            character, and end with 'g', e. g. 'pig' and 'pug'

For globbing purposes, * means 'zero or more characters'; while ? means 'precisely one character'.

DopeGhoti
  • 76,081
11

The /bin/ls program - just like any other program - doesn't deal with * patterns. Globbing is done by unix shells, not by the programs they are running. Let me explain a bit.

The ls program (which has several implementations that are free software, like the one from the GNU coreutils package, so feel free to study its source code) gets a sequence of strings (as second argument to its main function) which has been expanded by your shell (often bash). This is not specific to ls, it is true for every program started by your shell (often by using your PATH).

For example, in a directory containing a.c, a.o, b.c, d.c, e.h files, the ls *.c command is expanded by the shell into ls a.c b.c d.c so (in that case) ls gets 4 arguments: the first (of index 0) is ls, the second (of index 1) is a.c etc... , the 4th (of index 3) being d.c. So the ls program never sees *.c and the shell is execve(2)-ing it with four arguments. That program argument expansion, done by your shell, is called globbing. Read glob(7) and be aware of the role of the IFS shell variable.

(notice that in some cases, startup files like your $HOME/.bashrc might define ls as some alias; then replace ls with \ls or command ls or /bin/ls to avoid such an alias expansion (the latter two also work for circumventing a ls redefined as a function))

A very useful trick to understand what expansion happens is first to use the auto-completion features of your shell (e.g. with your TAB key), or to replace the command (in your case ls) with echo.

So what you really want is a shell with extended globbing. You can find several, and you can write your own shell to do it:

  • zsh has extended globbing

  • scsh has very different expansion (you can code it in Scheme)

  • Try also es or ksh

  • or adapt an existing shell (most are free software, whose source code you can study) to fit your needs, or write your own one.

FWIW, sash is a very simple and small shell (a bit buggy) whose source code is easy to read.

With Posix shells like bash, - and also with others, like fish - you can use command substitution, e.g. using find(1) (or even some ls piped into some grep, like ls -l $(ls | grep '^foo') which is a bit useless and won't work in some cases like filenames with spaces or newlines, since same as ls -l foo*).

Some commands do globbing with some (often quoted) argument or data, e.g. find(1); and you can write such programs, e.g. using glob(3) or wordexp(3). BTW, you can use regular expressions (used by grep(1) etc...) too in your programs, with the regex(3) functions.

Notice that writing your own shell is a very interesting exercise that I strongly recommend doing once. You'll need to learn how to use syscalls(2), e.g. by reading some book like Advanced Linux Programming, and implement your own globbing. And understanding (with the help of strace(1)) what system calls are done by some shell is worthwhile too.

Actually if ls ^cat means to you list all files whose name starts with cat you could just type the ls cat* command to your shell.

BTW, I really prefer zsh as my interactive shell (because IMHO its autocompletion works better, and its extended expansion is really useful), but it is a matter of taste, so YMMV. But you could try it (and if you adopt it, change your login shell with chsh(1)).

At last, I recommend avoiding spaces (and newlines and control characters and most punctuation) in your own file names (so just use Latin letters and digits, dot ., percent %, underscore _, plus +, non-initial dash - or tilde ~ ...), but when coding shell scripts for general use by others, think of file names with spaces and weird characters. Using "$@" in such scripts is then better.

PS. On Windows (which I don't know) things are rumored to be different, and globbing would be done by some startup code à la crt0 or maybe in your main. You could read Operating Systems: Three Easy Pieces to get some broader view.

  • 2
    Even worse, ls -l $(ls |grep ^foo) will fail if a selected filename contains an IFS character (by default SP HT NL) and may fail if a selected filename contains a glob char (? * [..] and sometimes others depending on shell, unless set -f or set -o noglob); ls -l foo* is safe in those cases. (Both fail if the expanded list exceeds maxargs -- unless ls is builtin which it could be.) And a few programs do their own patterns or regexps, which often must be quoted to get past the shell: find grep sed awk perl et cetera. – dave_thompson_085 Oct 17 '17 at 03:40
  • 2
    FWIW, ksh93 globs can use regexps, So you can do: ls -d ~(E)^foo for instance (E for ERE, P for perl-like RE, G for BRE, X for augmented RE). – Stéphane Chazelas Oct 17 '17 at 08:17
  • Might be worth noting that in zsh -o extendedglob, ls ^foo is passing all non-hidden file name except foo to ls (^ is a not glob operator) while in the Bourne shell ls ^foo would be the same as ls | foo (^ was the pipe operator before that changed to |) and in fish, that's ls 2> foo (^ redirects stderr). – Stéphane Chazelas Oct 17 '17 at 08:59
  • 1
    With the ast-open implementation of ls, you can use the same patterns as in ksh93 globs for its --ignore option, so you can do ls --ignore='~(E)^foo' – Stéphane Chazelas Oct 17 '17 at 09:06
5

As others have explained, ls does not provide support for regular expressions. However, it is possible to list files matching a certain regular expression with GNU find, as I demonstrate below.

First, you can use the ls action:

find * -maxdepth 0 -regex "ANY_REGEX" -ls

You could also use the -exec action, which allows you to use any command:

find * -maxdepth 0 -regex "ANY_REGEX" -exec ls -la {} \+

Finally, you could combine find with xargs, e.g. like this:

find * -regex "ANY_REGEX" -print0 | xargs -0 ls -la

By default, find will search all subdirectories, which is why, to replicate the behaviour of ls, I have added the filter -maxdepth 0.

The action -print0 of find and the option -0 of xargs in the last example are necessary to handle file names that contain spaces. Thanks to @ilkkachu for pointing it out.

  • This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - From Review – Romeo Ninov Oct 16 '17 at 18:07
  • I would argue that it does, at least partly. The second bit of the question is: can it be used with other commands? My answer shows that yes, it can be used with find. But I agree that it was not very complete. Thank you for the feedback anyway. – Rastapopoulos Oct 16 '17 at 18:09
  • Your last xargs command has bugs with files containing whitespace – Ferrybig Oct 16 '17 at 18:47
  • does it? It seems to work fine on my computer... – Rastapopoulos Oct 16 '17 at 19:30
  • @Rastapopoulos, not files containing whitespace, but file names containing whitespace. Try touch "foo bar" ; find -name 'foo*' | xargs ls -l . That's what find -print0 | xargs -0 is for (or just use find -exec ...). Also https://unix.stackexchange.com/questions/131766/why-does-my-shell-script-choke-on-whitespace-or-other-special-characters – ilkkachu Oct 17 '17 at 10:46
  • Oh, you're right actually. I had tested pretty much exactly that, but had messed up my regular expression, so find returned nothing and ls -l got executed without argument. Thank you very much! I have edited my answer. – Rastapopoulos Oct 17 '17 at 10:55
2

These anchors are parts of a regular expression, so you can expect to use them in applications where regex is supported. You're standard bash shell globbing doesn't use regex, so ls -l ^cat will not work.

Segfault
  • 121
-2

You can use regular expressions "with" ls, but you have to pass them to grep.

ls | grep ^cat | xargs -r ls -l

The Bash supports regular expressions only by the comparison operator =~, which requires an if clause and a for loop.

for x in *; do if [[ "$x" =~ ^cat ]]; then ls -l "$x"; fi; done
ceving
  • 3,579
  • 5
  • 24
  • 30
  • that's horribly complex and breaks down if any filenames contain whitespace – ilkkachu Oct 17 '17 at 10:49
  • @ilkkachu when this is already horribly complex, the solution for files with spaces will make you really happy: ls | grep ^cat | tr '\n' '\0' | xargs -0 -r ls -l. – ceving Oct 17 '17 at 10:58
  • or just use ls -l cat* ... – ilkkachu Oct 17 '17 at 11:04
  • @ilkkachu I can not find any regular expression in your example. – ceving Oct 17 '17 at 11:05
  • so, just use the correct tool for the job. – ilkkachu Oct 17 '17 at 11:15
  • @ilkkachu The question was not "what is the right tool" the question was "how to use a regular expression to filter files". – ceving Oct 17 '17 at 12:02
  • 1
    You'll note that the words "regular expression" don't appear in the question. At all. – ilkkachu Oct 17 '17 at 13:38
  • @ilkkachu And how would you call this string ^cat, if it is used as an argument to grep? Btw: I wish this site would have an ignore list. – ceving Oct 17 '17 at 13:56
  • the point is that without input from the user posing the question, we can't know if they specifically want to use a regex; or if they're looking for a way to do a pattern match at the start of the string in general. – ilkkachu Oct 18 '17 at 15:23