Using multiple commands with command substitution to populate an array bad?

Question

I am writing a bash script to "centralize" and ease the look up of information for our less experienced employees to use when providing technical support.

How realistically acceptable is this code from the strictest Bash scripter's point of view? This is essentially the whole script in a nutshell (actually a bash shell, ha ha.)

#!/bin/bash

declare -a array
array=(`grep -w foo /var/log/bar.log  | awk '{print $1,$2,$3,$14,$16}' | sed 's/<//g; s/>,//g; s/>//g;' | tr [:blank:] , && ssh XXX.XXX.XXX.XXX 'grep -w foo /var/log/bar.log' | awk '{print $1,$2,$3,$14,$16}' | sed 's/<//g; s/>,//g; s/>//g;' | tr [:blank:] ,`)

The script then continues on to operate on the array. The reason I'm running it like this is because I would like everything to be retained in RAM, which I can elaborate on if necessary.

I know the code is pretty ugly but is there any cleaner way this could be done without changing my goal, the programming language, or adding additional lines of code? I know I can clean up the sed regexp but beyond that I currently can't think of anything better...

cleaner to rewrite the whole grep/awk/sed/tr combo in pure awk — steve, Sep 29 '17 at 19:03
@steve Very interesting suggestion. I knew awk was powerful but through schooling or personal interest never ventured into it. I'll have to do get involved with awk scripting some day I have more time. — Nate.sh, Sep 29 '17 at 20:51
@Jesse_b Thanks for the heads up! I knew both methods were available to me but was never aware of explicit uses for one over other. Looking into I've found backticks may have caused some of the odd scenarios I've run into in the past... — Nate.sh, Sep 29 '17 at 20:53

score 0 · Accepted Answer · answered Sep 29 '17 at 20:35

Yes, possibly, sometimes... It depends.

Sorry for this rambling answer. The question does not mention the purpose and use of the array, nor the contents of the data file, so it's difficult to say anything specific.

Summary: No, this is not the usual/idiomatic way to work with data in a shell script.

The code, as it is written, is hard to follow as it is a rather long line. It looks like most operations could be performed by a single awk script (I will not write this script, today). It would fail my code-review I'm afraid.

Just the fact that you're looking to put (potentially) a lot of data in an array tells me that you are going to do processing of this array in one or several shell loops later. If it's just one loop, then why not pipe the result directly into the loop?

Well, see "Why is using a shell loop to process text considered bad practice?".

If the command pipeline only generates very few items in a restricted form (single words or strings that are otherwise guaranteed to be well-behaved in the shell), this may still be perfectly ok, but it's still not the idiomatic way of doing it.

The data has to be read at some point or other, and you may as well use it while reading it, without the added trouble of storing it in an array. Depending on what you're doing with the data, this may be done by awk or sed (or some other tool) directly.

As far as I can see, you are producing comma-separated strings, maybe it creates a CSV dataset with each line as an entry in the array? This is perfect for feeding into awk for example, again, without temporarily storing it in an array. Or, into one of the CSVkit tools for that matter. You could even pass it out to a real file and process it in one or several other scripts.

For people working on huge datasets (like I do), reading files into shell variables is impossible. Fortunately, most standard Unix tools acts like filters and makes it possible to pass data between stages of a program on more or less a line-by-line basis, using pipelines. Reading data "into RAM" does not speed this up.

I almost never read data from file into any sort of shell variable. I more often use variables for static strings, short-term temporary values, or counters, and arrays for static data, for when a simple parameter substitution is easier to perform on an array than sending it through sed (like ${arr[@]%.*} for stripping the extension off a few filenames in arr), or for aggregating data in a short loop.

The data set I'm working with is definitely not guaranteed to be well behaved in the shell if I understand what you what you mean. I'm pulling specific lines from a log file, of which contain spaces, special characters ( "<>","-", "[]","@" to name a few) so I'm trying to eliminate the possibility of introducing bugs by culling the information I want and then having it interpreted as a string. Each string is it's own index in the array. But what you and the link you posted are saying is I should be manipulating files on disk, with the least amount of pipes as possible, right? — Nate.sh, Sep 29 '17 at 21:41

Using multiple commands with command substitution to populate an array bad?

1 Answers1