Do anchors only work with grep or can it be used with other commands?
For example:
ls -l ^cat
Do anchors only work with grep or can it be used with other commands?
For example:
ls -l ^cat
Regular expression anchors such as ^
and $
are only parsed by tools which implement regular expressions. ls
is not such a tool, and so no, it cannot use them. However any binary invoked from the shell can use shell globbing, which is a simpler, albeit less powerful, wildcard-based search mechanism.
For example, for a list of all files whose names start with cat:
$ ls cat* # lists all files with names which start with 'cat'
$ ls *dog # lists all files with names which end with 'dog'
$ ls d*y # Lists all files which names which start with 'd' and end
with 'y', e. g. 'donkey'
$ ls p?g # Lists all files which start with 'p', have one additional
character, and end with 'g', e. g. 'pig' and 'pug'
For globbing purposes, *
means 'zero or more characters'; while ?
means 'precisely one character'.
The /bin/ls
program - just like any other program - doesn't deal with *
patterns. Globbing is done by unix shells, not by the programs they are running. Let me explain a bit.
The ls
program (which has several implementations that are free software, like the one from the GNU coreutils package, so feel free to study its source code) gets a sequence of strings (as second argument to its main
function) which has been expanded by your shell (often bash
). This is not specific to ls
, it is true for every program started by your shell (often by using your PATH).
For example, in a directory containing a.c
, a.o
, b.c
, d.c
, e.h
files, the ls *.c
command is expanded by the shell into ls a.c b.c d.c
so (in that case) ls
gets 4 arguments: the first (of index 0) is ls
, the second (of index 1) is a.c
etc... , the 4th (of index 3) being d.c
. So the ls
program never sees *.c
and the shell is execve(2)-ing it with four arguments. That program argument expansion, done by your shell, is called globbing. Read glob(7) and be aware of the role of the IFS
shell variable.
(notice that in some cases, startup files like your $HOME/.bashrc
might define ls
as some alias; then replace ls
with \ls
or command ls
or /bin/ls
to avoid such an alias expansion (the latter two also work for circumventing a ls
redefined as a function))
A very useful trick to understand what expansion happens is first to use the auto-completion features of your shell (e.g. with your TAB key), or to replace the command (in your case ls
) with echo
.
So what you really want is a shell with extended globbing. You can find several, and you can write your own shell to do it:
zsh has extended globbing
scsh has very different expansion (you can code it in Scheme)
or adapt an existing shell (most are free software, whose source code you can study) to fit your needs, or write your own one.
FWIW, sash
is a very simple and small shell (a bit buggy) whose source code is easy to read.
With Posix shells like bash, - and also with others, like fish - you can use command substitution, e.g. using find(1) (or even some ls
piped into some grep
, like ls -l $(ls | grep '^foo')
which is a bit useless and won't work in some cases like filenames with spaces or newlines, since same as ls -l foo*
).
Some commands do globbing with some (often quoted) argument or data, e.g. find(1); and you can write such programs, e.g. using glob(3) or wordexp(3). BTW, you can use regular expressions (used by grep(1) etc...) too in your programs, with the regex(3) functions.
Notice that writing your own shell is a very interesting exercise that I strongly recommend doing once. You'll need to learn how to use syscalls(2), e.g. by reading some book like Advanced Linux Programming, and implement your own globbing. And understanding (with the help of strace(1)) what system calls are done by some shell is worthwhile too.
Actually if ls ^cat
means to you list all files whose name starts with cat
you could just type the ls cat*
command to your shell.
BTW, I really prefer zsh
as my interactive shell (because IMHO its autocompletion works better, and its extended expansion is really useful), but it is a matter of taste, so YMMV. But you could try it (and if you adopt it, change your login shell with chsh(1)).
At last, I recommend avoiding spaces (and newlines and control characters and most punctuation) in your own file names (so just use Latin letters and digits, dot .
, percent %
, underscore _
, plus +
, non-initial dash -
or tilde ~
...), but when coding shell scripts for general use by others, think of file names with spaces and weird characters. Using "$@"
in such scripts is then better.
PS. On Windows (which I don't know) things are rumored to be different, and globbing would be done by some startup code à la crt0 or maybe in your main
. You could read Operating Systems: Three Easy Pieces to get some broader view.
ls -l $(ls |grep ^foo)
will fail if a selected filename contains an IFS character (by default SP HT NL) and may fail if a selected filename contains a glob char (? * [..]
and sometimes others depending on shell, unless set -f
or set -o noglob
); ls -l foo*
is safe in those cases. (Both fail if the expanded list exceeds maxargs -- unless ls
is builtin which it could be.) And a few programs do their own patterns or regexps, which often must be quoted to get past the shell: find grep sed awk perl
et cetera.
– dave_thompson_085
Oct 17 '17 at 03:40
ls -d ~(E)^foo
for instance (E for ERE, P for perl-like RE, G for BRE, X for augmented RE).
– Stéphane Chazelas
Oct 17 '17 at 08:17
zsh -o extendedglob
, ls ^foo
is passing all non-hidden file name except foo
to ls
(^
is a not glob operator) while in the Bourne shell ls ^foo
would be the same as ls | foo
(^
was the pipe operator before that changed to |
) and in fish
, that's ls 2> foo
(^
redirects stderr).
– Stéphane Chazelas
Oct 17 '17 at 08:59
ast-open
implementation of ls
, you can use the same patterns as in ksh93 globs for its --ignore
option, so you can do ls --ignore='~(E)^foo'
– Stéphane Chazelas
Oct 17 '17 at 09:06
As others have explained, ls
does not provide support for regular expressions. However, it is possible to list files matching a certain regular expression with GNU find
, as I demonstrate below.
First, you can use the ls
action:
find * -maxdepth 0 -regex "ANY_REGEX" -ls
You could also use the -exec
action, which allows you to use any command:
find * -maxdepth 0 -regex "ANY_REGEX" -exec ls -la {} \+
Finally, you could combine find
with xargs
, e.g. like this:
find * -regex "ANY_REGEX" -print0 | xargs -0 ls -la
By default, find
will search all subdirectories, which is why, to replicate the behaviour of ls
, I have added the filter -maxdepth 0
.
The action -print0
of find
and the option -0
of xargs
in the last example are necessary to handle file names that contain spaces. Thanks to @ilkkachu for pointing it out.
find
. But I agree that it was not very complete. Thank you for the feedback anyway.
– Rastapopoulos
Oct 16 '17 at 18:09
touch "foo bar" ; find -name 'foo*' | xargs ls -l
. That's what find -print0 | xargs -0
is for (or just use find -exec ...
). Also https://unix.stackexchange.com/questions/131766/why-does-my-shell-script-choke-on-whitespace-or-other-special-characters
– ilkkachu
Oct 17 '17 at 10:46
ls -l
got executed without argument. Thank you very much! I have edited my answer.
– Rastapopoulos
Oct 17 '17 at 10:55
These anchors are parts of a regular expression, so you can expect to use them in applications where regex is supported. You're standard bash shell globbing doesn't use regex, so ls -l ^cat
will not work.
You can use regular expressions "with" ls
, but you have to pass them to grep
.
ls | grep ^cat | xargs -r ls -l
The Bash supports regular expressions only by the comparison operator =~
, which requires an if
clause and a for
loop.
for x in *; do if [[ "$x" =~ ^cat ]]; then ls -l "$x"; fi; done
ls | grep ^cat | tr '\n' '\0' | xargs -0 -r ls -l
.
– ceving
Oct 17 '17 at 10:58
^cat
, if it is used as an argument to grep
? Btw: I wish this site would have an ignore list.
– ceving
Oct 17 '17 at 13:56
^
and$
only and in particular (for some obscure reason); or if you want to match filenames based on regular expression patterns in general (they contain much more than just the^
and$
anchors). Or in other words, is the point the particular type of pattern, or just matching at start of filename. – ilkkachu Oct 17 '17 at 13:43