Note that -i and \b are both non-standard extensions that some sed implementations have copied from perl. Here, why not using perl in the first place:
perl -i -pe '
BEGIN {
%map = (
"key1" => "value1",
"key 2" => "value2"
);
$re = join "|", map {qr{\Q$_\E}} keys %map;
}
s/\b(?:$re)\b/$map{$&}/g' your-file
The key => value mapping can also be expressed as:
%map = qw(
key1 value1
key2 value2
);
Or read from CSVs or other structured formats using the corresponding perl module (Text::CSV, JSON)... perl is a generic programming language geared for text manipulation, so it's the obvious choice here and there's not limit to what you can do with it.
For a simple TSV, that could be:
<map.tsv perl -i -pe '
BEGIN {
<STDIN>; # skip header
while (<STDIN>) {
chomp;
my ($k, $v) = split /\t/;
$map{$k} = $v;
}
$re = join "|", map {qr{\Q$_\E}} keys %map;
}
s/\b(?:$re)\b/$map{$&}/g' your-file
Note that, if you were doing:
sed -i -e 's/\bK1\b/V1/g' file
sed -i -e 's/\bK2\b/V2/g' file
That can be simplified to:
sed -i '
s/\bK1\b/V1/g
s/\bK2\b/V2/g' file
Or for your TSV:
<map.tsv awk -F'\t' '
NR > 1 {
# escape regexp operators in keys to emulate perl \Q \E:
gsub(/[][\/\\*.^$]/, "\\\\&", $1)
# escape /, \ and & in replacement:
gsub(/[\\/&]/, "\\\\&", $2)
print "s/\\b"$1"\\b/"$2"/g"
}' | sed -i -f - your-file
which reads (and writes) the file only once.
But in both cases, you'd run into problems if some of the values are also among the keys. For instance with s/\bA\b/B/g followed by s/\bB\b/C/g you'd end up with the As turned into Cs instead of Bs. The perl approach above doesn't have the problem as it runs only one subtitute operator.
Also note that perl in its regexps processes alternation from left to right, so if you have s/\b(?:foo|foo bar)\b/$map{$&}/g, on a foo bar input, it would replace foo, not foo bar.
And bear in mind that the order in which an associative array is walked is random.
sed (with those implementations that support extended regexps with -E / -r or \| in BREs) in contrast will try to find the longest match.
You could get the same behaviour with perl by sorting the keys by length before joining with |, for instance by replacing keys %map with sort {length$b <=> length$a} keys %map.
A final note: perl by default processes its input byte-wise, with word characters (\b matching the boundary between a word and non-word character) limited to the ASCII letters and digits and underscore, while sed implementations typically decode it according to the locale's charset. If your input or key/values contain non-ASCII characters, you can add the -Mopen=locale to decode it as per the locale's charset, or if it's in UTF-8 (the locale encoding most commonly used these days), just add the -C option.
key=value? Orkey\tvalue? Or maybekey value? Can you have spaces in either key or value? Just show us a few lines and the output you want from those lines so we can understand. – terdon Aug 18 '22 at 11:41sedis the wrong way to go about it (partly due to the risk of updating the wrong thing, an partly due to the fact that values in these formats are encoded). – Kusalananda Aug 18 '22 at 12:20key-valuepairs is not important. I can re-arrange them in any possible format. – Googlebot Aug 18 '22 at 12:28sedsubstitution instructions:s/World Health Organization/WHO/g. You may then use that list of instructions withsed -fto apply the instructions to whatever file yo have. – Kusalananda Aug 18 '22 at 16:19