Edit files with sed and save the result to different files whose names are based on the original file names

Question

I have a large (~300) set of .csv files, each of which are ~200k lines long, with a regular filename pattern:

outfile_n000.csv
outfile_n001.csv
outfile_n002.csv
.
.
.
outfile_nXXX.csv

I need to extract a range of lines (100013-200013) from each file, and save that extracted region to a new .csv file, appending a ptally_ prefix to differentiate it from the original file, while preserving the original file.

I know that I can use

sed -n '100013,200013p' outfile_nXXX.csv > ptally_outfile_nXXX.csv

to do this to a single file, but I need a way to automate this for large batches of files. I can get close by using the -i option in sed to do so:

sed -iptally_* -n '100013,200013p' outfile_nXXX.csv > ptally_outfile_nXXX.csv

but this writes the extracted lines to outfile_nXXX.csv, and leaves the original file renamed as ptally_outfile_nXXX.csv, as this is the purpose of -i.

Likewise, brace expansion in bash won't do the trick, as brace expansion and wildcards don't mix:

sed --n 10013,20013p *.csv > {,ptally_}*.csv

Any elegant ways to combine the extraction and renaming into a simpler process? Currently, I'm using a bash script to perform the swap between the outfile_nXXX.csv and ptally_outfile_nXXX.csv filenames, but I would prefer a more straightforward workflow. Thanks!

score 5 · Accepted Answer · answered Oct 06 '17 at 04:20

5

Use a for loop.

for f in outfile_n???.csv; do
  sed -n '100013,200013p' "$f" > ptally_"$f"
done

Alternatively, depending on your exact actual requirements, it may be more applicable to use csplit. Some of the GNU extensions extend its power considerably.

answered Oct 06 '17 at 04:20

Wildcard

36,499

3

Or simply sed -n "100013,200013w ptally_$f" "$f" inside the loop – Philippos Oct 06 '17 at 05:47
Works perfectly, just what I needed! I had tried a few variations on a for loop, but hadn't gotten it quite right. Thanks! – avoyles Oct 06 '17 at 18:42

MiniMax · Answer 2 · 2017-10-06T00:53:50.533

1

Not sed, but quite elegant way:

awk 'NR >= 100013 && NR <= 200013 {print > "ptally_" FILENAME}' outfile_nXXX.csv

For bulk extracting to new, appropriate files do:

awk 'FNR >= 100013 && FNR <= 200013 {print > "ptally_" FILENAME}' outfile_n*

Also, you can store filename into the variable before passing it to the sed:

filename="outfile_nXXX.csv"

sed -n '100013,200013p' "$filename" > "ptally_$filename"

edited Oct 06 '17 at 00:53

answered Oct 06 '17 at 00:41

MiniMax

4,123

1

XXX is a placeholder, not part of the actual name of the file. Your last example with storing it in a variable doesn't make sense at all given that there are around 300 files to be handled. – Wildcard Oct 06 '17 at 04:24
@Wildcard It was just example, not working solution. Idea. Sure, it should be for loop, iterating through all 300 files. And outfile_nXXX.csv is representing the general file name. Anyway, awk solution is more elegant, so I didn't developing bash way far enough :) – MiniMax Oct 06 '17 at 10:06

Edit files with sed and save the result to different files whose names are based on the original file names

2 Answers2

Linked