#!/usr/bin/perl -i
use strict;
# The %re hash holds the regexp searches and replacement strings.
my %re = ();
my $tfile = shift;
open(TEMPLATE, "<", $tfile) || die "couldn't open $tfile for read: $!\n";
while(<TEMPLATE>) {
chomp;
my ($search,$replace) = split;
$re{qr/$search/} = $replace;
};
close(TEMPLATE);
while (<>) {
foreach my $s (keys %re) {
s/$s/$re{$s}/g;
};
print;
}
This reads the template
file and builds up an associative array (aka "hash") called %re
of regular expression searches and replaces.
Then it loops over every remaining filename on the command line (e.g. input
) and performs all of those search and replace operations on every line of input. It uses qr//
to pre-compile the regular expressions - this is only a trivial optimisation if there aren't many lines in template
, but can result in a very significant speedup if there are many lines.
The -i
on the #!/usr/bin/perl -i
line causes perl to make in-place edits on the input files, rather than just print the changes to stdout. Change this to, e.g., -i.bak
if you want it to keep a backup copy of the files before they were changed.
Save as, e.g., cryptic0.pl
, make it executable with chmod +x cryptic0.pl
and run it like this:
$ ./cryptic0.pl template input
The script will not produce any output on the terminal. Instead, it will edit the input file(s).
For example, your input
file will be changed to:
$ cat input
gene_id "AT1G01030";
gene_id "AT1G01030";
gene_id "AT1G01010";
gene_id "AT1G01035";
BTW, this script will change all matches on all lines to their appropriate replacement string. If you are certain that there can only be one match on any given line, you can speed it up by changing this line:
s/$s/$re{$s}/g;
to this:
s/$s/$re{$s}/ && last;
This causes the script to skip out of the foreach loop to the print
statement, and then move on to the next input line as soon as it has had one successful search & replace.
BTW, see Why is using a shell loop to process text considered bad practice? for why it's not a good idea to do text processing in with sh loops. Use awk
or perl
or sed
or anything else instead of sh
or bash
.
MSTRG.2
is being replaced in the finaloutput
file. You are clobbering the file in each iteration, so I'd expect that only the last replacement would be visible. Just do the redirection tooutput
after the loop instead of in each iteration (just like you're already redirecting the loop's input fromtemplate
). I'm not posting this as an answer since I'm not sure why you say (or how it's possible) thatMSTRG.2
is being replaced. – Kusalananda Oct 15 '19 at 16:04while ...; do ...; done <template >output
. And redirect the twoecho
calls to standard error to avoid getting them in the output:echo ... >&2
. – Kusalananda Oct 15 '19 at 16:26read
loop at all, but do it in one pass by adapting the scheme given in the linked answer. – Philippos Oct 15 '19 at 19:46