Bash built-in regex vs sed and grep commands

Question

Bash has built-in regex for pattern matching. Sed and egrep commands can also do this.

What is the benefit to choose the built-in vs commands? I would like to know which one is faster and other aspects in comparison.

UPDATE:

Sorry, I might have mistaken some Bash feature to regex.

by "built-in regex", I meant Bash's string manipulation mentioned in Bash String Manipulation, in particular,

String Removal

stringZ=abcABC123ABCabc
echo ${stringZ#a*C}      # 123ABCabc

String replacement

stringZ=abcABC123ABCabc
echo ${stringZ/a?c/xyz}       # xyzABC123ABCabc
                              # Replaces first match of 'abc' with 'xyz'.

Are they regex?

short answer: String replacement and removal are not regex. Long answer: see the accepted answer. — oldpride, Sep 06 '23 at 18:00

Kusalananda · Accepted Answer · 2023-09-04T17:41:15.293

Addressing the updated question:

What you are showing are, strictly speaking, not applications of regular expressions in the shell. Both are parameter expansions using shell globs, the same sort of patterns that you'd use as filename globbing patterns to do filename expansions, e.g. things like cat text*.txt >combined.

The first expansion is a standard prefix string removal, while the second is a non-standard (but implemented by bash and some other shells) more general substitution. Neither use regular expressions, and you would not be able to do the same sort of operation with shell globbing patterns using grep, sed, or awk.

To use regular expressions in the shell, the shell must support it (it is not a standard feature of a Unix shell, although many shells provide it), and you must use the syntax that the shell provides, which in the case of bash is by using the =~ operator within [[ ... ]].

The use of basic regular expressions (as opposed to extended regular expressions) is also made possible in a limited way by the standard expr utility. But this is very rarely used.

Addressing the original formulation of the question:

You pick the tools that are appropriate for the job at hand.

The tools and their basic usages:

You would use =~ within [[ ... ]] in the bash shell to apply a regular expression to a string stored in a shell variable. This is typically used for testing whether a string matches a certain expression and potentially to extract substrings. It's ideal for tasks such as validating user-supplied input or handling short strings; tasks that don't involve line-by-line processing in a loop.
You may use grep for simpler file-processing tasks. It's useful for extracting lines from a stream, or from one or several files, based on patterns, either regular expressions or plain strings. It can also test whether one or several patterns are present in the input data. Most tasks you'd use grep for may also be performed by sed, but the opposite is not true.
To perform more advanced processing of files, you may employ sed. It allows you to edit a stream, or one or several documents, using substitutions with regular expressions within lines. Additionally, you can prepend, append, replace, or delete lines based on absolute line numbers, regular expressions, or specified ranges. Being a stream editor, the editing done with sed is often of the same type as you would otherwise have needed to do using a text editor. Most tasks you'd use sed for may also be performed by awk, but the opposite is not true.
When dealing with structured text data and requiring versatile data manipulation, awk may be more suitable than sed. You would use awk to process text files, particularly for tasks like extracting specific columns, performing mathematical operations, and applying custom logic to filter, transform, or aggregate data. Some of this processing would potentially involve awk's built-in ability to apply custom code to records matching particular regular expressions, or to use regular expressions in substitutions etc.
Some structured formats, such as JSON, YAML, XML, and CSV (using more advanced quoting rules than simple comma-separated values), require care and knowledge about how the rules of the format work with regards to quoting and character encoding etc. For these types of data, specialized processing software should be used, such as jq, Miller (mlr), xmlstarlet, csvkit etc. Many of these tools allow you to safely work with the given data using regular expressions if the task at hand requires it.

It is more common to start with a task and select the tool, than doing the opposite.

Thank you for mentioning the methodology and extra tools. My impression is that command like sed/grep are more optimized and support more features than BASH built-in regex. But using BASH build-in regex will save spawning a sub-process. I am wondering how optimized is BASH build-in regex. — oldpride, Sep 03 '23 at 18:11
@oldpride I'd be slightly surprised if the shell had its own implementation of the regular expression engine when there is a perfectly usable regular expression implementation available in the C library (also used by grep, sed and awk and others). So again, it comes down to what the task is. You would most definitely not use bash to loop through every line of a file to see if they match an expression, for example. Not because of the regular expression, but due to the slowness and awkwardness of the shell loop (see link in answer). — Kusalananda, Sep 03 '23 at 18:15
@oldpride Likewise, you would not call grep, sed or any other external tool to see whether the string given to you from the user ends with six copies of the same digit (or whatever). That's a task for a regular expression match in the shell (using =~ within [[ ... ]]). — Kusalananda, Sep 03 '23 at 18:17
@Kusalanda, thank you for mentioning your surprise. I may have mistaken some bash feature to regex. See my updated post. The string manipulation are definitely internal to Bash, meaning no sub-process involved. But I am not sure that they are regex now. They are definitely patterns. — oldpride, Sep 04 '23 at 03:31

Bash built-in regex vs sed and grep commands

1 Answers1