1

If you want a sed with perl-style regular expressions, it seems that there are two ways to do it:

ssed -R
perl -pe

Is there any reason to prefer one tool over the other for this purpose? They both support the -i flag. I assume they use the same perl regular expression library, and would therefore probably have similar performance and identical features and bugs. It seems pretty uncommon for perl not to be installed on a modern unix machine.

muru
  • 72,889
  • 3
    ssed threw me off-guard, first time I'm hearing of it. It's not installed by default in common Linux distro, whereas perl often is. And ssed uses PCRE, not Perl's regex engine, so they may have some differences. – muru Nov 10 '15 at 20:12
  • related: http://unix.stackexchange.com/questions/110875/using-modifiers-of-perl-compatible-regex-pcre-in-grep So maybe you can use perl's modifiers if you use perl, but not if you use ssed. –  Nov 11 '15 at 20:44

1 Answers1

3

ssed is much smaller binary than perl. /usr/bin/ssed is 123K with PCRE linked statically. /usr/bin/perl is 1.7M, and that's not including any of the standard modules that get installed with it.

ssed could fit on a tiny distro, or a rescue / installer ISO, whereas perl might not. It's hard to see the point, though, because PCRE is mostly just a convenience - without too much extra work, basic or extended regex in ordinary sed can do most of what PCRE does....and that would save another 123K of space.

Of lesser importance, ssed is also probably faster than perl, at least in startup overhead. Compiling a perl script isn't exactly fast. This is probably only significant in shell scripts with for/while loops that repeatedly call ssed or perl.

cas
  • 78,579
  • hmmm... your priorities are different than mine. i dont personally much mind the 1.5mb of space nearly as much as i do the few nanoseconds of additional exec time per call. but, for that matter, scripts that exec other programs in the body of a loop are almost always awful anyway: every program has a builtin loop all its own, and a shell script which prefers its own loop to a child process's loop is not using that child process effectively. but the answer is valid and good all the same. – mikeserv Nov 10 '15 at 21:24
  • They're not my priorities, i don't care much about either issue. but space may be very important for a tiny distro or installer cd. i have both ssed and perl (and lots of perl modules) installed on my systems. A few nanoseconds or even micro or milliseconds are irrelevant in a shell script except in a loop....and shells scripts execing other programs in a loop is 100% perfectly normal, not inherently 'awful'. not all programs called by shell scripts have loops - basename, for example doesn't have a loop. – cas Nov 10 '15 at 21:30
  • basename doesn't make any sense in a shell script, but it is a valid point. i disagree about time - concurrent processes chained together in a shell pipeline all working a single stream is almost always the correct way to do a thing. execing other programs in a shell loop when the other thing might be done - which is usually how that works out - makes a shell script awful. – mikeserv Nov 10 '15 at 21:31
  • huh? basename is often used in shell scripts and in shell loops. Have you never done something like: for f in * ; do bn=$(basename "$f" .txt) ; sed -e '...' "$f" > "$bn.out" ; done ? there are better ways of doing that in bash or ksh or zsh, but they aren't sh. anyway, basename was merely an example. there are many other programs called from shell loops that don't have loops themselves. – cas Nov 10 '15 at 21:39
  • different problems require different solutions/tools. you can't just use a hammer for everything. – cas Nov 10 '15 at 21:40
  • now that's exemplary of an awful shell script. basename called from a POSIX-shell is both slower than what the shell might do on its own, and more prone to err in every case in which i have ever seen it used. i wont rule out completely that it might have a use in a shell script, but ive never witnessed one in practice. – mikeserv Nov 10 '15 at 21:41
  • please tell me how to get to your perfect world where you never, ever have to do something less than absolutely perfect using magical DWIM tools. – cas Nov 10 '15 at 21:43
  • and, btw, if you're looping through a list of a few hundred files and you actually care about an extra ms or so on each pass through the loop, you are wasting your time on trivial, irrelevant optimisations. that adds up to a few hundred milliseconds for the life of the script. a human is barely capable of noticing that, let alone thinking "that's way too slow". – cas Nov 10 '15 at 21:46
  • i guess the answer to that problem is not to loop over a hundred text files in the shell. it is naive in that it would need some additional prep to accurately handle newlines/: in filenames, but: grep '' ./* /dev/null | sed ... and so on would be a good start to dropping the shell loop entirely. what the shell does best is mangle arguments - by all means, build your list of files up in a shell loop with native commands, and then call your target pipeline to work the list, but dont call it in the loop. its terrible scripting. – mikeserv Nov 10 '15 at 21:52
  • not everything can be done in a pipe. an extremely obvious example is that sometimes you need to make if/then/else decisions based on information unavailable from stdin. that for loop with sed example above - that can be optimised so that the sed script (or whatever) is only run if "$bn.out" doesn't already exist...saving enormous amounts of time if the sed script is slow or processing huge files. a few milliseconds in the loop versus minutes or longer re-processing a file that has already been processed. – cas Nov 10 '15 at 21:56
  • you mean sometimes you need to mangle arguments? i completely agree. or if you mean from other files, then, yes, that is the other thing at which the shell excels: redirections. and so (leaving split aside for a moment), IFS=;while read -r in; do case $in in ($match) ;; (*) printf %s\\n "$in"; sed -ne"/$match/q;p";esac; _other_pipeline <other_input; done <input >output is a good example of other programs called from a shell loop in that it factors the loop based on queues from input and swaps/blends streams when necessary. – mikeserv Nov 10 '15 at 22:03
  • @cas: You don't need basename (and of course dirname) in shell script, all its functional can be done with Parameter Expansion. For the shell loop, I invite you to read http://unix.stackexchange.com/q/169716/38906 – cuonglm Nov 11 '15 at 01:36
  • @cuonglm what is it about the phrase 'for example' that's so difficult to understand the nature of? 1. it was an example, specifically stated to be a stand in for the many other commands that can be run in a shell loop. 2. I also specifically stated that there are better ways to do it in bash etc. 3. I was not arguing that basename is fabulous and should be used all the time. I was arguing that mikeserv's attitude was far too black and white and dogmatic, that it was disconnected from the practical reality of writing everyday shell scripts. – cas Nov 11 '15 at 01:40