1

Is there a UNIX mechanism for replacing matched strings dynamically--with a function of the matched string?

E.g., say I want to replace URL matches with their URL-encoded counterparts or a convert certain matches from snake_case to camelCase, or just upper-case them?

Ruby has gsub method that takes a lambda ("a block" in ruby parlance), but I'd rather not use ruby.

I tried with standard tools and FIFO, but I keep losing white space around my matches somewhere in the read part (see below). Any guesses?

#!/bin/bash

d="\f" #<=A character that's not expected in the input
swapNewlines() { tr "$d"'\n' '\n'"$d"; }  #Since unix tools are line-oriented

#Sample transformation -- coloring red
export C_red="$(tput setaf 1)"
export C_normal="$(tput sgr0)"
transform(){ printf "$C_red%s\n$C_normal" "$*"; }

even() { sed -n '2~2p'; }
odd() { sed -n '1~2p'; }

#Open an anonymous FIFO and assign that FD to variable whose names comes on $1
mkchan(){
  local __name="${1:-FD}" __tmpd= __ret=1
  if __tmpd="`mktemp -d`" && mkfifo "$__tmpd/p" && eval "exec {"$__name"}<>"'$__tmpd/p'; then
    __ret=0
  fi
  rm -rf "$__tmpd" 
  return "$__ret"
}

#No-op
df  |sed 's/\<[1]*\>/'"$d"'\0'"$d"'/g' | swapNewlines | swapNewlines |tr -d "$d"
printf '%s\n' -------------------------------------------------------

mkchan fd; export fd
#Surround matches with the "$d" character and swap newlines with fd; then do line-processing
df  |sed 's/\<[1]*\>/'"$d"'\0'"$d"'/g' | swapNewlines | 
   tee >(even >&"$fd") | 
   odd | while read o; 
          do printf "%s\n" "$o"
            read e <&"$fd"
            #printf '%s\n' "$e"
            transform "$e"
          done  |
          swapNewlines |tr -d "$d"
muru
  • 72,889
Petr Skocik
  • 28,816
  • while IFS= read -r o ... do IFS= read -r o... but a better approach is to set IFS in the script so you don't have to loop over the assignment. it is better still not to use read, but to coagulate the data in more capable reader utility and . dot what you want (as . will typically pull data in 4k blocks rather than byte by byte). - and this is problematic: eval "exec {"$__name"}<>"'$__tmpd/p' – mikeserv Jan 01 '16 at 17:42
  • 2
    I would use Perl, where the replacement part of s/// can be an arbitrary expression when the e flag is used. E.g. to uppercase: perl -pe 's/(pattern)/ uc $1 /ge' file – glenn jackman Jan 01 '16 at 22:04

1 Answers1

3

The standard tools for dynamically computer string replacement are the shell itself and AWK. Standard sh has a few string manipulation constructs, and bash has a few more; you'd typically use them when doing something simple enough on a small amount of data. AWK is a Turing-complete language of its own, with classical imperative constructs (variable assignment, arrays of strings and maps from strings to strings, if statements, while loops, …) and string manipulation primitives (concatenation, splitting, regex matching and replacement (but without match groups), …). (And there's also sed, which is Turing-complete, but gets very hairy as soon as you go beyond a simple regex replacement.)

The main reason why whitespace would be lost is because you forgot to quote a variable expansion. You also need to take care when using read: its job is to split lines into fields, so if you want to read a line literally, you need IFS= read -r.

I haven't reviewed your script fully — it looks pretty complex for what it's doing — but you probably want while IFS= read -r o instead of while read o, if you want to preserve whitespace. However, to post-process the output of df, you should instead use something like

df -P | awk '…'