2

I have a directory with lots of json and pdf files that are named in a pattern. I am trying to filter the files on name with the following pattern \d{11}-\d\.(?:json|pdf) in the command. For some reason it is not working. I believe it is due the fact that the xargs take the arguments one big line of string or when the input is split there is some whitespace, \n or null character.

ls | xargs -d '\n' -n 1 grep '\d{11}-\d\.(?:json|pdf)'

if I try just this ls | xargs -d '\n' -n 1 grep '\d' It selects file names with digits in them, as soon as I specify the multiplicity regex, nothing matches.

CodeWeed
  • 135
  • 1
    are you planning to filter the list of filenames, or the contents of the files? Because running ... |xargs grep $pattern would run grep $pattern file1 file2 ..., and look at the contents of the files – ilkkachu Aug 28 '21 at 19:22
  • 1
    It's unclear what you want to achieve. Do you just want to list the filenames? What are some examples of filename that you want to list and that you don't want to list? – Kusalananda Aug 28 '21 at 19:22
  • 2
    You also don't want to parse the output of ls. You haven't clarified what the objective is, but if you are starting with wanting to find files that match a certain pattern(s), you are better off using something along the lines of find /path/to/directory -type f -name *:json -o -name *pdf – Nasir Riley Aug 28 '21 at 19:27
  • @ilkkachu Yes. That would work as well. More clarity is needed on what is expected though. – Nasir Riley Aug 28 '21 at 19:40
  • @ilkkachu No, I am not looking inside the files, but rather on the filename. I am trying to apply the pattern on the filenames and filtering it. – CodeWeed Aug 28 '21 at 22:30
  • @NasirRiley I am trying to filter the file names based on the pattern matches. – CodeWeed Aug 28 '21 at 22:32
  • I have edited the question and made it more clear. I am not sure why, it was also showing the matched file names with just \d on the command as regex. Does it look inside file and filenames ? – CodeWeed Aug 28 '21 at 22:37

2 Answers2

6

First, ls | xargs grep 'pattern' makes grep look for occurrences in contents of files listed by ls, not in list of filenames. To look for filenames it should be enough to do:

ls | grep 'pattern'

Second, grep '\d{11}-\d\.(?:json|pdf)' would work only with GNU grep and -P option. Use the following syntax instead - it works with GNU, busybox and FreeBSD implementations of grep:

ls | grep -E '[[:digit:]]{11}-[[:digit:]]\.(json|pdf)'

Third, parsing ls is not a good idea. Use GNU find:

find . -maxdepth 1 -regextype egrep -regex '.*/[[:digit:]]{11}-[[:digit:]]\.(json|pdf)'

or FreeBSD find:

find -E . -maxdepth 1 -regex '.*/[[:digit:]]{11}-[[:digit:]]\.(json|pdf)'
1

You don't need any of that complexity. Just use a shell glob. This one is for shells such as bash that understand {x,y} braced alternatives:

ls *[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9].{json,pdf}

If you want to do something with the matched files, don't take the output of ls but just use the glob to iterate across the files directly.

Chris Davies
  • 116,213
  • 16
  • 160
  • 287
  • 1
    that is a lot of digit regex :). I was looking for a more consistent regex based solution as the directory has a lot of files. Thanks for you reply. – CodeWeed Aug 28 '21 at 22:47
  • 1
    It's not a regex; it's a glob used directly by the shell. Try it – Chris Davies Aug 28 '21 at 23:20