13

I'm trying to do some tricks with dd. I thought it would be possible to store some hexvalues in a variable called "header" to pipe it into dd.

My first step without a variable was this:

$ echo -ne "\x36\xc9\xda\x00\xb4" |dd of=hex
$ hd hex

00000000  36 c9 da 00 b4                                    |6....|
00000005

After that I tried this:

$ header=$(echo -ne "\x36\xc9\xda\x00\xb4") 
$ echo -n $header | hd

00000000  36 c9 da b4                                       |6...|
00000004

As you can see I lost my \x00 value in the $header variable. Does anyone have an explanation for this behavior? This is driving me crazy.

jwodder
  • 448
Frank
  • 131

4 Answers4

16

You can't store a null byte in a string because Bash uses C-style strings, which reserve the null byte for terminators. So you need to rewrite your script to simply pipe the sequence that contains the null byte without Bash needing to store it in the middle. For example, you can do this:

printf "\x36\xc9\xda\x00\xb4" | hd

Notice, by the way, that you don't need echo; you can use Bash's printf for this an many other simple tasks.

Or instead of chaining, you can use a temporary file:

printf "\x36\xc9\xda\x00\xb4" > /tmp/mysequence
hd /tmp/mysequence

Of course, this has the problem that the file /tmp/mysequence may already exist. And now you need to keep creating temporary files and saving their paths in strings.

Or you can avoid that by using process substitution:

hd <(printf "\x36\xc9\xda\x00\xb4")

The <(command) operator creates a named pipe in the file system, which will receive the output of command. hd will receive, as its first argument, the path to that pipe—which it will open and read almost like any file. You can read more about it here: https://unix.stackexchange.com/a/17117/136742.

giusti
  • 1,737
  • 2
  • 16
  • 31
  • 1
    While correct, this is an implementation detail and not the exact reason. I looked at it, and the POSIX standard actually requires this behaviour, so there you have the actual reason. (As some have pointed out, zsh will do it, but only in nōn-POSIX mode.) I actually looked into it because I was wondering if it was worth to implement this in mksh – mirabilos Feb 24 '17 at 19:33
  • @mirabilos, would you care to expand on that? AFAICT, behaviour is unspecified per POSIX for command substitution when the output has NUL characters, and for zsh in POSIX mode, the only relevant difference I can think of is that in sh emulation, \0 is not in the default value of $IFS. echo "$(printf 'a\0b')" still works OK in sh emulation in zsh. – Stéphane Chazelas Feb 24 '17 at 21:48
  • 4
    @mirabilos Considering that the shells predates the POSIX standard by a decade or more, I guess you could find out that the actual actual reason is that shells used C-style strings and the standard was built around that. – giusti Feb 25 '17 at 01:49
  • I found a good Q for detailed discussion on printf versus echo. http://unix.stackexchange.com/questions/65803/why-is-printf-better-than-echo – Paulb Feb 25 '17 at 12:34
9

You can use zsh instead which is the only shell that can store the NUL character in its variables. That character even happens to be in the default value of $IFS in zsh.

nul=$'\0'

Or:

nul=$'\x0'

Or

nul=$'\u0000'

Or

nul=$(printf '\0')

However note that you can't pass such a variable as an argument or environment variable to a command that is executed as the arguments and environment variables are NUL-delimited strings passed to the execve() system call (a limitation of the system's API, not the shell). In zsh, you can however pass NUL bytes as arguments to functions or builtin commands.

echo $'\0' # works
/bin/echo $'\0' # doesn't
  • 1
    "You can use zsh instead". No thanks - I'm teaching myself bash-scripting as a beginner right now. I don't want to confuse myself with an other syntax. But thank you veray much for suggest it – Frank Feb 24 '17 at 18:53
  • As a matter of fact, you used zsh syntax in your question. echo -n $header to mean to pass the content of the $header variable as a last argument to echo -n is zsh (or fish or rc or es) syntax, not bash syntax. In bash, that has a very different meaning. More generally zsh is like ksh (bash, the GNU shell, being more or less a part-clone of ksh, the Unix de-facto shell) but with most of the design idiosyncrasies of the Bourne shell fixed (and a lot of extra features, and a lot more user-friendly/less astonishing). – Stéphane Chazelas Feb 24 '17 at 20:03
  • Be careful: zsh may change a zero byte sometimes: echo $(printf 'ab\0cd') | od -vAn -tx1c prints 61 62 20 63 64 0a, that is an space where a NUL should exist. –  Feb 24 '17 at 20:58
  • @sorontar, yes, as I said \0 is in the default $IFS, so $(printf 'ab\0cd') is split into ab and cd. Try with echo "$(printf 'ab\0cd')" instead. – Stéphane Chazelas Feb 24 '17 at 21:15
  • 1
    And that is something no other (none, nil) shell will reproduce. That makes an script behave in very special ways in zsh. In my opinion: zsh is just trying to be too clever. –  Feb 24 '17 at 21:17
  • 2
    Having "fixed" the design misfeatures present in the POSIX sh standard that getting accustomed to writing zsh scripts means one is getting accustomed to practices which would be buggy if exercised in any other shell. This isn't such a problem with a syntax that's so unlike a different language that skills or habits aren't likely to transfer, but such is not the case at hand. – Charles Duffy Feb 24 '17 at 21:19
  • @sorontar, both echo $(printf 'ab\0cd') and echo "$(printf 'ab\0cd')" are unspecified in POSIX and not working "properly" in every other shell. OTOH, the behaviour is clearly specified in zsh and works as documented. It makes perfect sense to split on the NUL byte by default. That can be useful in ls -ld -- $(grep -rZl whatever .) though you'd rather write ls -ld -- ${(0)"$(grep -rZl whatever .)"} in that case, as you don't want to split on the other $IFS character. – Stéphane Chazelas Feb 24 '17 at 21:38
  • @CharlesDuffy, that is a fair point. OTOH, shells like rc (or to some extent fish) with a radically different syntax and that have fixed the Bourne issues never took off for the very reason that they're not Bourne-like. IMO, zsh's stance is courageous and laudable here and a step in the right direction. – Stéphane Chazelas Feb 24 '17 at 21:41
  • @StéphaneChazelas I will be bold also and ask a very naive question: isn't zsh supposed to: In ZSH, however, word splitting is disabled by default (which is great), that should mean that the "Command Substitution" string should not be split. I am sure that I am wrong and I will be clearly corrected by stating clearly why in this particular case the naive question I am making is invalid. But that just miss the point: One has to be an expert in zsh to make it work the way one wants. Simple users easily get lost . –  Feb 24 '17 at 23:39
  • @sorontar, word splitting (but not globbing which would not make sense) happens in zsh upon command substitution, because that's generally what you want. (though in that specific case, I'm not sure I agree with that particular design decision). Generally zsh chooses the path of least astonishment, that's the opposite of needing to be expert to work with it, I can't think of where you're getting that from. – Stéphane Chazelas Feb 25 '17 at 00:15
  • @StéphaneChazelas, zshdoes not actually store raw NUL characters in a variable. Just as ksh93 uses a 'hack' (base64) to store NUL and other characters in binary variables, zsh also uses a 'hack' to store NUL (and some other characters) in a variable - a Meta byte (0x83) followed by a byte containing 'character xor 32'. See zsh.h. – fpmurphy Jan 14 '18 at 16:16
  • @fpmurphy1, that's internal only and transparent to the user. In zsh, $var[1] for instance gets the first character of $var whether it's a NUL character or other. How zsh stores it internally is irrelevant as it's not visible to the user. That's different in ksh93. In ksh93, If a $var contains the base64 encoding of abc, ${var:0:1} will contain the first character of that base64 encoding, not a, which is not useful. ${#var} will expand to the length of the encoding, not the length of the data it is meant to represent. – Stéphane Chazelas Jan 14 '18 at 16:43
1

Bash uses C strings internally which cannot store the null byte. Store the value in a temporary file like this:

    zHex=$(mktemp --tmpdir "$(basename "$0")-XXXX")
    trap "rm -f ${zHex@Q}" EXIT

The variable zHex now contains a unique file name. The file referenced by $zHex can be deleted manually, but the file will be automatically deleted when the program terminates for any reason.

Then use the variable like this:

    echo -ne "\x36\xc9\xda\x00\xb4" > "$zHex"
    hd "$zHex"

This does NOT store the value with null bytes into a variable. Instead, it uses a variable to store the name of a file. The file, like any other file, may contain null bytes and can be used over and over. The file itself will most likely never be physically written to the disk.

Via a trap, bash deletes the file automatically, so you need not worry about removing it manually unless you are creating an a crazy array of garbage. Due to RAM buffering, this technique is decently fast.

Paul
  • 210
  • There seems to be code missing. You also save the temporary file's pathname in zHeader, but then appear to remove $zTemp (but with literal quotes inserted around the name with @Q, for some unexplained reason). The answer is correct, but the code is irrelevant to the question. – Kusalananda Aug 31 '21 at 22:46
0

Since a carriage return can be part of a file name, I like to use null terminated lists. But I cannot store the sting with null bytes because bash because Bash stores strings as simple C strings where the null byte is the string terminator and therefore cannot be part of the string itself.

To get around the problem, I create an array of strings where the null byte is assumed to exist after each element. The list itself obviously contains null bytes. I store the value containing null bytes in an array like this...

    readarray -d $'\0' zArray < <(null_terminated_list_maker)

Then, I can reproduce the value with the null bytes like this...

    [[ "${zArray[*]}" ]] && printf '%s\0' "${zArray[@]}"

In this manner, a bash array can be used to store any value containing null bytes.

The purpose of the [[ "${zArray[*]}" ]] test is to see if the array has any values at all (the null string is a value). The test solves the problem where if an empty array passed to printf like this, then printf will print one null byte, which is wrong. It should print nothing.

When representing completely arbitrary data, there is a problem: Did your input actually terminate with the null byte? This method needs expanded to handle data which may or may not end with the null byte.

Paul
  • 210
  • Please note that the original problem arises from NULL bytes inside a string variable, whereas your post concerns strings that are terminated by a NULL byte. – AdminBee Aug 27 '21 at 10:30
  • I read the first comment about data containing null bytes. Maybe my rewording of the first paragraph will help you understand. The problem with this solution is the terminating null byte. – Paul Aug 31 '21 at 22:33