How to exclude directories from `ls -R` output?

Question

If

ls -aR | grep ":$"

returns

./.hiddendir2:
./dir1:
./dir3:
./dir3/.hiddendir4:

then how can you update the pattern given to grep so that you only get

./dir1:
./dir3:

(note that the last line containing .hiddendir4 was omitted even though it's parent was not hidden)

Here is a wordy version of the same question with some context:

I found a great script at http://www.centerkey.com/tree/ for printing a directory tree, but the output is cluttered sometimes by the descendants of the .git directory for repos I'm working on.

If I read the source correctly I need to change the pattern given to grep at this line:

ls -aR | grep ":$"

So far I came up with ^\./(?!\.)\w+:$ which seemed to work in testing at http://regexpal.com/, but then broke when I tried running in bash. I suspect the combination of (1) differences between regex testing tools i'm using to how grep parses a pattern, (2) bash escape character requirements, and (3) pattern design all need to be addressed to solve this. Help appreciated before I end up spending too long on this.

Simple answer: Remove the a argument – Trindaz May 01 '14 at 23:14 — Trindaz, May 01 '14 at 23:14
http://mywiki.wooledge.org/XyProblem – n.st May 01 '14 at 23:25 — n.st, May 01 '14 at 23:25

score 5 · Answer 1 · answered May 01 '14 at 01:58

5

You should really use file to do any deeper interrogation of directory contents than what 'ls -l' provides you.

find . -type f

The find command is ancient unix magic. It will save you from endless regex wrangling.

answered May 01 '14 at 01:58

Dan Garthwaite

8,176

score 2 · Answer 2 · answered May 01 '14 at 23:23

Since you're looking for something that prints a directory tree, why not just use Steve Baker's tree? (That's the one most distributions include in their repositories.)

You should take a look at its manpage, but this might get you started:

tree -a -I '.git'

(include dot-files/-dirs, exclude everything that contains '.git')

score 1 · Answer 3 · answered May 01 '14 at 03:51

To expand on what @dan-garthwaite is saying, find is usually the tool to use when searching directory trees. The general syntax is:

find /path/to/search [one or more expressions]

See the man page for all the possible expressions, but some handy ones are:

-type x         # Where x = f for files, d for directories, s for sockets, etc.
-mtime x        # Where x = number of minutes since the file was last accessed
-name pattern   # Where pattern = a string to search for (use * for wildcard match)
-iname pattern  # Same as -name, but case insensitive
-size x         # Where x = size, human readable (eg. 500k, 20m, ...)
-user name      # Where name = owner of file

So, for example, to search in /home for all jpg files owned by bob and larger than 5MB, you could run:

find /home -user bob -size +5m -iname '*.jpg'

score 1 · Accepted Answer · edited Apr 13 '17 at 12:36

The first command you posted can't give the results you show, because they don't have colons at the end; presumably you stripped them. The script you refer to does this to select directory paths, which ls -R displays with a colon appended, but there is nothing preventing a file name & path from ending with a colon and giving a false positive. This also makes your title misleading; you want to keep most directories and exclude only a few.

Question as asked: There are several different "flavors" (standards) for regular expressions, most similar but with important differences in details. There are two common in Unix and Unix-origin software, called unimaginatively Basic Regular Expression (BRE) and Extended Regular Expression (ERE). There is an even simpler form used in most shells (and standard find) to match filenames (and case choices) (only ? * and [...]) that isn't even called regexp, just pattern. There is an even more extended form defined by Perl, but usable outside, called Perl Compatible Regular Expression (PCRE). See Why does my regular expression work in X but not in Y? and http://en.wikipedia.org/wiki/Regular_Expression .

Your (?! "lookahead" is only in PCRE, while standard grep does BRE by default or ERE with -E, although it appears some versions of grep can do PCRE or you can get and install a separate pcregrep. But you don't need it. If you wanted non-hidden children of curr dir, just do '^\./\w' or '^\./[^.]' depending how strict you want to be. But you say you want no hidden dir anywhere in the path, which is harder to do with a positive regexp, and much easier with negative matching like grep -v '/\.' .

Backslash is special in both bash (and most if not all shells) and grep (BRE or ERE), so they it must be either doubled \\ or single-quoted; I prefer the latter. Note double quotes are not sufficient here.

Better approaches: you actually want only directory paths, so as suggested by other answers find -type d | grep -v /\. is a better approach. That doesn't waste time listing ordinary-file names you then discard. Alternatively you can just use ls -R | grep :$ without the -a; by default ls already skips hidden entries (both ordinary-files and directories). As the script you refer to does!

How to exclude directories from `ls -R` output?

4 Answers4