12

I have a directory full of .tsv files and I want to run a grep command on each of them to pull out a certain group of text lines and then save it to an associated text file with a similar file name. So for example, if I was grepping just one of the files, my grep command looks like this:

grep -h 8-K 2008-QTR1.tsv > 2008Q1.txt

But I have a list of tsv files that look like:

2008-QTR1.tsv
2008-QTR2.tsv
2008-QTR3.tsv
2008-QTR4.tsv
2009-QTR1.tsv
2009-QTR2.tsv
2009-QTR3.tsv
...

And after grepping they need to be stored as:

2008Q1.txt
2008Q2.txt
2008Q3.txt
2008Q4.txt
2009Q1.txt
2009Q2.txt
2009Q3.txt

Any thoughts?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
jtyun
  • 129

4 Answers4

12

In ksh93/bash/zsh, with a simple for loop and parameter expansion:

for f in *-QTR*.tsv
do 
  grep 8-K < "$f" > "${f:0:4}"Q"${f:8:1}".txt
done

This runs the grep on one file at a time (where that list of files is generated from a wildcard pattern that requires "-QTR" to exist in the filename as well as a ".tsv" ending to the filename), redirecting the output to a carefully-constructed filename based on:

  • the first four characters of the filename -- the year
  • the letter Q
  • the 9th character of the filename -- the quarter
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
5

The obligatory POSIX sh variant:

#! /bin/sh -
ret=0
for file in [[:digit:]][[:digit:]][[:digit:]][[:digit:]]-QTR[1234].tsv; do
  base=${file%.tsv}
  grep 8-K < "$file" > "${base%%-*}Q${base##*-QTR}".txt || ret=$?
done
exit "$ret"
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
2

Another option

for f in  200{8..9}-QTR{1..4}.tsv; do
    grep "pattern" $f > $(sed "s/[-RTtsv]*//g" <<< $f)txt;
done

Walkthrough: Set up an expansion that creates a list of your filenames

200{8..9}-QTR{1..4}.tsv

expands to

2008-QTR1.tsv 2008-QTR2.tsv 2008-QTR3.tsv 2008-QTR4.tsv 2009-QTR1.tsv 2009-QTR2.tsv 2009-QTR3.tsv 2009-QTR4.tsv

and to do every year and quarter to date would be

20{08..19}-QTR{1..4}.tsv

Iterate over the list for..do..done, extract the pattern you are looking for from the file

grep "pattern" $f

and redirect to the new filename formed by deleting the unwanted characters with sed and adding the txt suffix

$(sed "s/[-RTtsv]*//g" <<< $f)txt

or

$(sed "s/[-RT]*//g" <<< ${f%%.*}.txt)
bu5hman
  • 4,756
  • It should be noted that this brace-expansion idea hard-codes the expected filenames; it would not pick up newer or older files, and would complain of missing files in the range. Not a deficiency, except the OP showed a file name listing ending in "..." – Jeff Schaller Nov 07 '19 at 19:36
  • Very true, but so does @stephanechazelas to an extent. What this doesn't do is assume that every tsv file is required, nor that the substrings extracted will conform to OP's pattern. Bananannanan-QTRanana.tsv wont, that's for sure. What it does do is allow OP to process a selected subset of known files, Swings and roundabouts. – bu5hman Nov 07 '19 at 19:46
  • Indeed, the three of us so far came up with different approaches; I just enjoy adding a little explanation about how it works so that the OP (or future readers) understand why it works so that they know if they can adapt it to their situation. – Jeff Schaller Nov 07 '19 at 19:48
  • Anyone for golf? – bu5hman Nov 07 '19 at 19:49
  • Thank you so much!!! – jtyun Nov 08 '19 at 20:11
0

If you want to avoid an explicit loop, there is the following solution. Someone will maybe be able to improve it. It looks something like this.

ls -1 *.tsv | xargs -n1 -I'{}' bash -c 'f="{}";grep 8-K $f > ${f//[^0-9Q]/}.txt'
  1. ls just lists the files you want to process
  2. xargs process each of these files, on by one (-n1)
  3. a bash shell is launched to be able to process the strings (cf point 5)
  4. Sets the filename to variable $f
  5. ${f//[^0-9Q]/} removes all the characters you don't want in the .txt filenames (so this is specific to your example)

Pros: - Simple one liner

Cons: - A bash process is started for each file processed

Maybe is there a similar solution without using bash, but I don't know one (for example, eval shouldn't work in this context)

Jacques
  • 551
  • Alternative with awk: grep -H 8-K *.tsv | awk -F ':' -v OFS=':' '{fn=gensub(/[^0-9Q]/,"","g",$1) ".txt";for(i=1;i<NF;i++){$i=$(i+1)};NF--;print $0 > fn}', but we start here to another world that bash, with awk. – Jacques Nov 10 '19 at 22:02
  • Pure awk, something like: awk '/8-K/{print > (gensub(/[^0-9Q]/,"","g",FILENAME) ".txt")}' *.tsv – Jacques Nov 10 '19 at 22:15