17

I have a bash function that takes a file as a parameter, verifies the file exists, then writes anything coming off stdin to the file. The naive solution works fine for text, but I am having problems with arbitrary binary data.

echo -n '' >| "$file" #Truncate the file
while read lines
do  # Is there a better way to do this? I would like one...
    echo $lines >> "$file"
done
mattdm
  • 40,245
David Souther
  • 455
  • 1
  • 3
  • 9

3 Answers3

15

Your way is adding line breaks to every thing that it write in space of whatever separator ($IFS) is using to split up the read. Instead of breaking it up into newlines just take the whole thing and pass it along. You can reduce the entire bit of code above to this:

 cat - > $file

You don't need the truncate bit, this will truncate and write the whole STDIN stream out to it.

Edit: If you are using zsh you can just use > $file in place of the cat. You are redirecting to a file and truncating it, but if there is anything hanging out there waiting for something to accept STDIN it will get read at that point. I think you can do something like this with bash but you would have to set some special mode.

Caleb
  • 70,105
  • I couldn't get the stdin redirect example to work, but changing the cat example to >| (I have noclobber set) works like a charm. Thanks for making my day ^.^ – David Souther Aug 19 '11 at 19:59
  • +1 for the cat-less version. Always avoid useless cats ;) – rozcietrzewiacz Aug 19 '11 at 20:01
  • @rozcietrzewiacz: True, except it was an afterthought and I was wrong. This might not be a useless use of cat. The only thing you might be able to do is > $file. This only works as the first thing that looks for stdin in the parent shell script. Basically all of David's code can be reduced to a single character, but I think the cat - is more elegant and less trouble prode because it's understood on sight. – Caleb Aug 19 '11 at 20:10
  • Sometimes I string four or five cats together, just to annoy UUOC fanatics – Michael Mrozek Aug 19 '11 at 20:32
  • @MichaelMrozek: Sometimes I name my data files cat just so people who insist on using it necessarily have to do mental gymnastics to read the code. Named pipes are also good targets. – Caleb Aug 19 '11 at 20:37
  • @Michael I didn't know people do get so dogmatic about it :D Hmm, maybe I did get syndromes of this disease. Caleb's example seemed a very simple and thus appealed to me. As it was wrong - well, then it seems a good use of cat after all. I love the way one can learn such stuff here. – rozcietrzewiacz Aug 20 '11 at 00:00
  • @rozcietrzewiacz Cats are just fine. In many cases, I think they make code more readable. By putting the cat first in the pipe, then the grep, I see very clearly what is being worked on. It agrees with our natural language, as well: cat file | grep foo is subject-verb, whereas grep foo file is verb file (and grep foo < file is obtuse). – David Souther Aug 21 '11 at 15:47
  • @caleb Yes. To both :) – David Souther Aug 21 '11 at 15:48
  • @DavidSouther: I disagree with your analysis of natural English language. In English, commands often start with verbs. In natural language grep foo file could be "Go look for bar in the third drawer of my file cabinet." and grep foo < file could be "Find the paper labeled bar in this stack of papers." The subject is an implied "you" with is left off since you are speaking to the shell. – Caleb Aug 22 '11 at 09:30
  • @Caleb I'd buy that. It comes down to how the programmer thinks about the problem, and how the maintainer expects to read the solution. – David Souther Aug 22 '11 at 15:52
  • @DavidSouther: True. Any syntax you are not accustomed to using is going to be more difficult to understand than one you are, I just wouldn't say command args < data is inherently obtuse. – Caleb Aug 22 '11 at 16:05
7

To read a text file literally, don't use plain read, which processes the output in two ways:

  • read interprets \ as an escape character; use read -r to turn this off.
  • read splits into words on characters in $IFS; set IFS to an empty string to turn this off.

The usual idiom to process a text file line by line is

while IFS= read -r line; do …

For an explanation of this idiom, see Why is while IFS= read used so often, instead of IFS=; while read..?.

To write a string literally, don't just use plain echo, which processes the string in two ways:

  • On some shells, echo processes backslash escapes. (On bash, it depends whether the xpg_echo option is set.)
  • A few strings are treated as options, e.g. -n or -e (the exact set depends on the shell).

A portable way of printing a string literally is with printf. (There's no better way in bash, unless you know your input doesn't look like an option to echo.) Use the first form to print the exact string, and the second form if you want to add a newline.

printf %s "$line"
printf '%s\n' "$line"

This is only suitable for processing text, because:

  • Most shells will choke on null characters in the input.
  • When you've read the last line, you have no way to know if there was a newline at the end or not. (Some older shells may have bigger trouble if the input doesn't end with a newline.)

You can't process binary data in the shell, but modern versions of utilities on most unices can cope with arbitrary data. To pass all input through to the output, use cat. Going on a tangent, echo -n '' is a complicated and non-portable way of doing nothing; echo -n would be just as good (or not depending on the shell), and : is simpler and fully portable.

: >| "$file"
cat >>"$file"

or, simpler,

cat >|"$file"

In a script, you usually don't need to use >| since noclobber is off by default.

  • thanks for pointing out xpg_echo, that's actually a problem I was having somewhere else in my code and didn't even realize. Re noclobber, I am in the habit of turning it on in my bashrc. – David Souther Aug 21 '11 at 15:49
1

This will do exactly what you want:

( while read -r -d '' ; do
    printf %s'\0' "${REPLY}" ;
  done ;

  # When read hits EOF, it returns non-zero which exits the while loop.
  # That data still needs to be output:
  printf %s "${REPLY}"
) >> ${file}

Do note the memory usage though. This reads input in a null-delimited fashion.

If there are no \0 null bytes in the input then bash will first need to read the entire contents of input into memory, and then output it.

Regarding your truncate step:

echo -n '' >| "$file" #Truncate the file

a much simpler and equivalent is:

> ${file}   #Truncate the file
lmcanavals
  • 1,174