Shell programming, avoiding tempfiles

Question

I often write KSH shell scripts that follow the same pattern:

(1) retrieve output from one or more command
(2) format it using grep|cut|awk|sed and print it to the screen or to a file

In order to do that, I often store the output of (1) in a tempfile, and then do the formatting in (2) on that file.

Take that code for instance:

TMPFILE=file.tmp

# If tmpfile exists rm it.
[ -f $TMPFILE ] && rm -f $TMPFILE

for SERVICE in $(myfunc); do
    getInfo $SERVICE > $TMPFILE # Store raw output in the TMPFILE

    # I retrieve the relevant data from the TMPFILE
    SERV_NAME=$(head -1 $TMPFILE | sed -e 's/ $//')
    SERV_HOSTNAME=$(grep HOSTNAME $TMPFILE | cut -d "=" -f2)
    SERV_ARGS=$(grep Arguments $TMPFILE | cut -d ":" -f2)

    print $SERV_NAME $SEP $SERV_HOSTNAME $SEP $SERV_ARGS
    rm -f $TMPFILE #rm the TMPFILE in vue of next iteration
done

Is there a way, using pipes, redirections and whatnots, to avoid writing a file to disk each time?

If it helps, I'm using ksh Version M-11/16/88i

It's good form to avoid ALL_CAPS variable names in shell scripts, and treat that namespace as reserved by the shell to avoid clobbering important things like PATH or other shell or environment variables. TMPFILE may be fine, but TMPDIR is special, so do you really want to be walking that tightrope? — jw013, Aug 24 '12 at 15:25
For posterity: another question which was marked as a duplicate of this one http://unix.stackexchange.com/questions/63923/pseudo-files-for-temporary-data includes an answer involving named fifo pipes, which could also be used here (although it is probably not be the best option in this particular case). — goldilocks, Feb 06 '13 at 13:46
@goldilocks: Maybe we can get the two questions merged into one. Can we contact a moderator to do this? — rahmu, Feb 06 '13 at 17:51
@rahmu : I flagged the other question. I guess it is up to the powers that be now... — goldilocks, Feb 06 '13 at 18:15

rozcietrzewiacz · Accepted Answer · 2012-08-24T15:15:20.963

9

Your code looks like an entirely justified example of using tempfiles to me. I'd stay: stick with this approach. The only thing that really needs to be changed is the way you create the tempfile. Use something like

 TMP=$(tempfile)

or

 TMP=$(mktemp)

or at least

 TMP=/tmp/myscript_$$

This way you won't let the name be easily predicted (security) and out rule interference between several instances of the script running at the same time.

edited Aug 24 '12 at 15:15

answered Sep 28 '11 at 12:19

rozcietrzewiacz

39,269

2

pedantically, quotes are not required for variable assignment. – glenn jackman Sep 28 '11 at 13:00
1

@glenn True, in this case they should not make a difference, since each of the commands typically produces a string without spaces. But it is a good habit to have quotes in cases where you assign command output to a variable - so I'll persist on leaving it this way. – rozcietrzewiacz Sep 29 '11 at 06:36
Removed the quotes in the last example for distinction. – rozcietrzewiacz Sep 29 '11 at 06:40
3

@roz No, you missed the point. Variable assignments in shell are recognized before any expansions are done, and field splitting is NOT done for variable assignments. Thus, var=$(echo lots of spaces); echo "$var" is fine and should produce lots of spaces as output. The real caveat no one mentioned is command substitution strips all trailing newlines. This is not an issue here, and only matters e.g. if you had a broken mktemp that created file names with trailing newlines. The usual work around, if required, is var=$(echo command with trailing newline; echo x); var=${var%x}. – jw013 Aug 24 '12 at 13:42
1

@jw013 Yes, I realize this now - didn't, when I wrote the answer a year back. Thanks for pointing it out! (fixing...) – rozcietrzewiacz Aug 24 '12 at 15:13
@rozcietrzewiacz No worries :) I knew it was an old post - but being a public site my comment was mostly for posterity so anyone who sees this in the future doesn't get confused. – jw013 Aug 24 '12 at 15:18

l0b0 · Answer 2 · 2011-09-28T12:34:48.887

5

You could use a variable:

info="$(getInfo $SERVICE)"
SERV_NAME="$(head -1 $TMPFILE <<<"$info" | sed -e 's/ $//')"
...

From man ksh:

<<<word       A  short  form of here document in which word becomes the
              contents of the here-document after any parameter  expan-
              sion,  command  substitution, and arithmetic substitution
              occur.

Advantages include:

Enables parallel execution.
In my experience this is tons faster than temporary files. Unless you have so much data that you end up swapping it should be orders of magnitude faster (only barring HD caching buffers, which might be about as quick for small data amounts).
Other processes or users can't mess up your data.

edited Sep 28 '11 at 12:34

answered Sep 28 '11 at 12:28

l0b0

51,350

<<< doesn't seem to exist in my ksh. I get an error, and I cannot seem to find it in the man page. I'm using ksh88. Are you sure this version should have this feature? – rahmu Sep 28 '11 at 13:23
Nope; I guess I didn't check the right man page (there was no mention of the version number on the web page :/ ) – l0b0 Sep 28 '11 at 13:45
<<< is bash 'here string'. I don't think it appears in any other shell. (Oh, zsh maybe...) – rozcietrzewiacz Sep 29 '11 at 06:31
2

@rozcietrzewiacz: Google for man ksh. It was certainly mentioned there. – l0b0 Sep 29 '11 at 08:06
Interesting. Clearly, I was wrong: even the wikipedia mentions that "you can use a here-string in bash, ksh or zsh". – rozcietrzewiacz Sep 29 '11 at 09:03
4

Guess how bash implements here-strings and here-docs. sleep 3 <<<"here string" & lsof -p $! | grep 0r → sleep 30251 anthony 0r REG 253,0 12 263271 /tmp/sh-thd-7256597168 (deleted) — yep, it uses a tempfile. – derobert Aug 24 '12 at 17:04
@derobert - looks like ksh93 does the same: readlink /dev/fd/0 <<<$(echo) prints /tmp/sf39.r0p (deleted). [yd]ash do pipes, but most do tempfiles. Still, those are just lockfiles, usually - they just do the open() then unlink() but retain the descriptor and then do the write. It is far more secure at least than most shell solutions - especially considering how cheap it is. Also POSIX defines a here-doc as part of a shell's input file being redirected to another command - and so maybe in that way you at least drop half of the i/o. I dunno. I like em. – mikeserv Dec 22 '14 at 19:50

score 2 · Answer 3 · answered Sep 28 '11 at 12:17

You have two options:

You retrieve the data once (in your example with getInfo) and store it in a file as you do.
You fetch the data on each time and do not store it locally, i.e., you call getInfo every time

I do not see the problem in creating a temporary file to avoid reprocessing/re-fetching.

If you are worried about leaving the temporary file somewhere you can always use trap to be sure to delete it in case the script is killed/interrupted

trap "rm -f $TMPFILE" EXIT HUP INT QUIT TERM

and use mktemp to create a unique filename for your temporary file.

score 1 · Answer 4 · answered Sep 28 '11 at 14:33

Instead of generating a file, construct shell assignment statements and evaluate that output.

for SERVICE in $(myfunc); do
    eval $(getInfo $SERVICE |
               sed -n -e '1/\(.*\) *$/SERV_NAME="\1"/p' \
                   -e '/HOSTNAME/s/^[^=]*=\([^=]*\).*/SERV_HOSTNAME="\1"/p' \
                   -e '/Arguments/^[^:]*:\([^:]*\).*/SERV_ARGS="\1"/p')
    print $SERV_NAME $SEP $SERV_HOSTNAME $SED $SERV_ARGS
done

Or if you just want to print the information:

for SERVICE in $(myfunc); do
    getInfo $SERVICE | awk -vsep="$SEP" '
        BEGIN{OFS=sep}
        NR == 1 { sub(/ *$/,""); SERV_NAME=$0 }
        /HOSTNAME/ { split($0, HOST, /=/; SERV_HOSTNAME=HOST[2]; }
        /Arguments/ { split($0, ARGS, /:/; SERV_ARGS }
        END { print SERV_NAME, SERV_HOSTNAME, SERV_ARGS }'
done

Shell programming, avoiding tempfiles

4 Answers4

Linked

Related