Grep over multiple files redirecting to a different filename each time

Question

I have a directory full of .tsv files and I want to run a grep command on each of them to pull out a certain group of text lines and then save it to an associated text file with a similar file name. So for example, if I was grepping just one of the files, my grep command looks like this:

grep -h 8-K 2008-QTR1.tsv > 2008Q1.txt

But I have a list of tsv files that look like:

2008-QTR1.tsv
2008-QTR2.tsv
2008-QTR3.tsv
2008-QTR4.tsv
2009-QTR1.tsv
2009-QTR2.tsv
2009-QTR3.tsv
...

And after grepping they need to be stored as:

2008Q1.txt
2008Q2.txt
2008Q3.txt
2008Q4.txt
2009Q1.txt
2009Q2.txt
2009Q3.txt

Any thoughts?

The grep pattern is always the same? – schrodingerscatcuriosity Nov 07 '19 at 17:10 — schrodingerscatcuriosity, Nov 07 '19 at 17:10
Yes, the grep pattern is always that same, thank you! – jtyun Nov 08 '19 at 20:06 — jtyun, Nov 08 '19 at 20:06

Jeff Schaller · Answer 1 · 2019-11-07T19:49:45.297

12

In ksh93/bash/zsh, with a simple for loop and parameter expansion:

for f in *-QTR*.tsv
do 
  grep 8-K < "$f" > "${f:0:4}"Q"${f:8:1}".txt
done

This runs the grep on one file at a time (where that list of files is generated from a wildcard pattern that requires "-QTR" to exist in the filename as well as a ".tsv" ending to the filename), redirecting the output to a carefully-constructed filename based on:

the first four characters of the filename -- the year
the letter Q
the 9th character of the filename -- the quarter

edited Nov 07 '19 at 19:49

answered Nov 07 '19 at 17:39

Jeff Schaller

67,283
35
116
255

<"$f" is unnecessary here and could be just "$f" – D. Ben Knoble Nov 07 '19 at 19:22
2

@D.BenKnoble Stéphane edited that in, but it's useful if you want consistent error messages (from your shell, versus from the various utilities); see also https://unix.stackexchange.com/a/458268/117549 – Jeff Schaller Nov 07 '19 at 19:32
Fascinating, thanks. – D. Ben Knoble Nov 07 '19 at 19:37
This worked, thank you!!! – jtyun Nov 08 '19 at 20:11

score 5 · Answer 2 · edited Nov 07 '19 at 19:42

5

The obligatory POSIX sh variant:

#! /bin/sh -
ret=0
for file in [[:digit:]][[:digit:]][[:digit:]][[:digit:]]-QTR[1234].tsv; do
  base=${file%.tsv}
  grep 8-K < "$file" > "${base%%-*}Q${base##*-QTR}".txt || ret=$?
done
exit "$ret"

edited Nov 07 '19 at 19:42

Jeff Schaller

67,283
35
116
255

answered Nov 07 '19 at 17:49

Stéphane Chazelas

544,893

Why <"$f"? Redirection not necessary there. – D. Ben Knoble Nov 07 '19 at 19:23
1

@D.BenKnoble, using redirections has many advantages over passing the file as argument. Note that it's also generally less work, so I'd say it's more the passing the file as argument that is not necessary here (as in this case, we don't need grep to know the name of the file, just its contents). See When should I use input redirection? – Stéphane Chazelas Nov 07 '19 at 19:43
Thank you so much!!! – jtyun Nov 08 '19 at 20:11

bu5hman · Answer 3 · 2019-11-07T19:32:32.123

2

Another option

for f in  200{8..9}-QTR{1..4}.tsv; do
    grep "pattern" $f > $(sed "s/[-RTtsv]*//g" <<< $f)txt;
done

Walkthrough: Set up an expansion that creates a list of your filenames

200{8..9}-QTR{1..4}.tsv

expands to

2008-QTR1.tsv 2008-QTR2.tsv 2008-QTR3.tsv 2008-QTR4.tsv 2009-QTR1.tsv 2009-QTR2.tsv 2009-QTR3.tsv 2009-QTR4.tsv

and to do every year and quarter to date would be

20{08..19}-QTR{1..4}.tsv

Iterate over the list for..do..done, extract the pattern you are looking for from the file

grep "pattern" $f

and redirect to the new filename formed by deleting the unwanted characters with sed and adding the txt suffix

$(sed "s/[-RTtsv]*//g" <<< $f)txt

or

$(sed "s/[-RT]*//g" <<< ${f%%.*}.txt)

edited Nov 07 '19 at 19:32

answered Nov 07 '19 at 19:19

bu5hman

4,756

It should be noted that this brace-expansion idea hard-codes the expected filenames; it would not pick up newer or older files, and would complain of missing files in the range. Not a deficiency, except the OP showed a file name listing ending in "..." – Jeff Schaller Nov 07 '19 at 19:36
Very true, but so does @stephanechazelas to an extent. What this doesn't do is assume that every tsv file is required, nor that the substrings extracted will conform to OP's pattern. Bananannanan-QTRanana.tsv wont, that's for sure. What it does do is allow OP to process a selected subset of known files, Swings and roundabouts. – bu5hman Nov 07 '19 at 19:46
Indeed, the three of us so far came up with different approaches; I just enjoy adding a little explanation about how it works so that the OP (or future readers) understand why it works so that they know if they can adapt it to their situation. – Jeff Schaller Nov 07 '19 at 19:48
Anyone for golf? – bu5hman Nov 07 '19 at 19:49
Thank you so much!!! – jtyun Nov 08 '19 at 20:11

score 0 · Answer 4 · answered Nov 10 '19 at 21:10

If you want to avoid an explicit loop, there is the following solution. Someone will maybe be able to improve it. It looks something like this.

ls -1 *.tsv | xargs -n1 -I'{}' bash -c 'f="{}";grep 8-K $f > ${f//[^0-9Q]/}.txt'

ls just lists the files you want to process
xargs process each of these files, on by one (-n1)
a bash shell is launched to be able to process the strings (cf point 5)
Sets the filename to variable $f
${f//[^0-9Q]/} removes all the characters you don't want in the .txt filenames (so this is specific to your example)

Pros: - Simple one liner

Cons: - A bash process is started for each file processed

Maybe is there a similar solution without using bash, but I don't know one (for example, eval shouldn't work in this context)

Alternative with awk: grep -H 8-K *.tsv | awk -F ':' -v OFS=':' '{fn=gensub(/[^0-9Q]/,"","g",$1) ".txt";for(i=1;i<NF;i++){$i=$(i+1)};NF--;print $0 > fn}', but we start here to another world that bash, with awk. — Jacques, Nov 10 '19 at 22:02
Pure awk, something like: awk '/8-K/{print > (gensub(/[^0-9Q]/,"","g",FILENAME) ".txt")}' *.tsv — Jacques, Nov 10 '19 at 22:15

Grep over multiple files redirecting to a different filename each time

4 Answers4