Addressing the updated question:
What you are showing are, strictly speaking, not applications of regular expressions in the shell. Both are parameter expansions using shell globs, the same sort of patterns that you'd use as filename globbing patterns to do filename expansions, e.g. things like cat text*.txt >combined
.
The first expansion is a standard prefix string removal, while the second is a non-standard (but implemented by bash
and some other shells) more general substitution. Neither use regular expressions, and you would not be able to do the same sort of operation with shell globbing patterns using grep
, sed
, or awk
.
To use regular expressions in the shell, the shell must support it (it is not a standard feature of a Unix shell, although many shells provide it), and you must use the syntax that the shell provides, which in the case of bash
is by using the =~
operator within [[ ... ]]
.
The use of basic regular expressions (as opposed to extended regular expressions) is also made possible in a limited way by the standard expr
utility. But this is very rarely used.
Addressing the original formulation of the question:
You pick the tools that are appropriate for the job at hand.
The tools and their basic usages:
You would use =~
within [[ ... ]]
in the bash
shell to apply a regular expression to a string stored in a shell variable. This is typically used for testing whether a string matches a certain expression and potentially to extract substrings. It's ideal for tasks such as validating user-supplied input or handling short strings; tasks that don't involve line-by-line processing in a loop.
You may use grep
for simpler file-processing tasks. It's useful for extracting lines from a stream, or from one or several files, based on patterns, either regular expressions or plain strings. It can also test whether one or several patterns are present in the input data. Most tasks you'd use grep
for may also be performed by sed
, but the opposite is not true.
To perform more advanced processing of files, you may employ sed
. It allows you to edit a stream, or one or several documents, using substitutions with regular expressions within lines. Additionally, you can prepend, append, replace, or delete lines based on absolute line numbers, regular expressions, or specified ranges. Being a stream editor, the editing done with sed
is often of the same type as you would otherwise have needed to do using a text editor. Most tasks you'd use sed
for may also be performed by awk
, but the opposite is not true.
When dealing with structured text data and requiring versatile data manipulation, awk
may be more suitable than sed
. You would use awk
to process text files, particularly for tasks like extracting specific columns, performing mathematical operations, and applying custom logic to filter, transform, or aggregate data. Some of this processing would potentially involve awk
's built-in ability to apply custom code to records matching particular regular expressions, or to use regular expressions in substitutions etc.
Some structured formats, such as JSON, YAML, XML, and CSV (using more advanced quoting rules than simple comma-separated values), require care and knowledge about how the rules of the format work with regards to quoting and character encoding etc. For these types of data, specialized processing software should be used, such as jq
, Miller (mlr
), xmlstarlet
, csvkit
etc. Many of these tools allow you to safely work with the given data using regular expressions if the task at hand requires it.
It is more common to start with a task and select the tool, than doing the opposite.