I have a large (~300) set of .csv files, each of which are ~200k lines long, with a regular filename pattern:
outfile_n000.csv
outfile_n001.csv
outfile_n002.csv
.
.
.
outfile_nXXX.csv
I need to extract a range of lines (100013-200013) from each file, and save that extracted region to a new .csv file, appending a ptally_
prefix to differentiate it from the original file, while preserving the original file.
I know that I can use
sed -n '100013,200013p' outfile_nXXX.csv > ptally_outfile_nXXX.csv
to do this to a single file, but I need a way to automate this for large batches of files. I can get close by using the -i
option in sed to do so:
sed -iptally_* -n '100013,200013p' outfile_nXXX.csv > ptally_outfile_nXXX.csv
but this writes the extracted lines to outfile_nXXX.csv
, and leaves the original file renamed as ptally_outfile_nXXX.csv
, as this is the purpose of -i
.
Likewise, brace expansion in bash won't do the trick, as brace expansion and wildcards don't mix:
sed --n 10013,20013p *.csv > {,ptally_}*.csv
Any elegant ways to combine the extraction and renaming into a simpler process? Currently, I'm using a bash script to perform the swap between the outfile_nXXX.csv
and ptally_outfile_nXXX.csv
filenames, but I would prefer a more straightforward workflow. Thanks!
sed -n "100013,200013w ptally_$f" "$f"
inside the loop – Philippos Oct 06 '17 at 05:47for
loop, but hadn't gotten it quite right. Thanks! – avoyles Oct 06 '17 at 18:42