58

A colleague suggested creating a random key via the following command:

tr -dc A-Za-z0-9_\!\@\#\$\%\^\&\*\(\)-+= < /dev/urandom | head -c 32 | xargs

It gave me the error:

tr: Illegal byte sequence

I'm concerned that I do not have /dev/urandom on my system. I tried googling to figure out how to install this file, but I have come up empty. I tried locate urandom and also came up empty. (well actually, it found the man page, but that doesn't help)

How do I make urandom available on my Mac OSX system? (Lion)

Kirk Woll
  • 1,127

6 Answers6

72

Based on the error message that you get, I don't think /dev/urandom is the problem. If it were, I'd expect an error like no such file or directory.

I searched for the error message you got and found this, which seems like it might be relevant to your issue: nerdbynature.de 2010-04-11 tr-Illegal-byte-sequence (Web Archive's 2019-09 snapshot)

Basically, specify the locale by prepending the tr command with LC_CTYPE=C (or LC_ALL=C, see comments):

LC_CTYPE=C tr -dc A-Za-z0-9_\!\@\#\$\%\^\&\*\(\)-+= < /dev/urandom | head -c 32 | xargs
lk-
  • 3,723
  • Thanks, that indeed did the trick. Any idea why I cannot find urandom or random? Are they special magical "files" that don't exist on the actual filesystem? (Also I suggested an edit to help mitigate link-rot) – Kirk Woll Aug 13 '12 at 18:53
  • 1
    I believe locate doesn't directly search your filesystem, but rather looks up your query using a pre-built database. This database is most likely configured to ignore /dev/ and other 'special' filesystems. – lk- Aug 13 '12 at 19:01
  • fair enough, but I don't see it when I look directly in /dev. Go figure. But thanks again for the help. – Kirk Woll Aug 13 '12 at 19:04
  • 2
    doesn't seem to work on 10.9; still fails with the same error message. LC_ALL=C does the trick tho. – Erik Kaplun Mar 24 '15 at 11:31
  • 1
    Can confirm LC_CTYPE=C and LC_ALL=C both work on Catalina (10.15.7, specifically). – Jivan Pal Mar 25 '21 at 19:09
  • 1
    LC_CTYPE=C didn't work for me on macOS 11.4 (Big Sur), but LC_ALL=C worked. – Hosam Aly Jun 28 '21 at 23:02
19

Your tr attempts to interpret its input as text in UTF-8 encoding. So it will complain and abort upon the first byte sequence which is not valid UTF-8. Prefixing tr with LC_ALL=C or LC_CTYPE=C will export that variable into the environment of tr, thus changing its idea of the local character set to the C standard, i.e. everything is just a sequence of opaque bytes.

By the way, is the sequence \)-+ in your command intentional? This includes * as well, which you already included, but does not include - itself as you might have intended. Better to write one of these instead:

LC_ALL=C tr -dc 'A-Za-z0-9_!@#$%^&*()\-+=' < /dev/urandom
LC_CTYPE=C tr -dc A-Za-z0-9_\!\@\#\$\%\^\&\*\(\)\\-+= < /dev/urandom
MvG
  • 4,411
9

As others have indicated, your problem isn't that /dev/urandom is missing, but rather how tr works on OS X. Instead of messing around with enviornment varialbes, use perl in place of tr:

perl -pe 'binmode(STDIN, ":bytes"); tr/A-Za-z0-9_\!\@\#\$\%\^\&\*\(\)-+=//dc;' < /dev/urandom | head -c 32; echo

This has the advantage of being portable across OS X, Redhat and Ubuntu.

(I also removed the pipe to xargs, replacing witch echo, to get a newline at the end of the output.)

Trenton
  • 191
  • Sooner or later, I expect Perl to make binmode ":utf8" standard, at which point your Perl solution will have the same problem that tr does. – Mark Jul 21 '15 at 03:02
  • Addressed Mark's concern by adding binmode(STDIN, ":bytes") to the code sample. – Trenton Feb 16 '17 at 17:08
  • 3
    Sorry, but I kinda fail to see, why a dependency on perl is more desirable than just prefixing a "LC_ALL=C" to a command. – Grmpfhmbl Oct 01 '20 at 13:32
4

Firstly, did you intend to include - or * in the list of valid characters? The parameter to tr includes the sequence )-+ which means "the byte range starting with ) and ending with +, which is actually )*+.

Secondly, rather than reading many kilobytes from the kernel's entropy pool (and thus marking the entire pool as insecure, which will impact any other processes that need secure entropy), consider reading only as many bits as you need: use head -c... as the first step, and then translate rather than discard unwanted characters.

This particular version of the problem is a bit unusual in that uses 76 different symbols; most just want alphanumeric, so if you'd be satisfied with just 64 symbols, then using the base64 utility will minimize consumption of the entropy pool (note that 24 is 6/8 of 32):

head -c24 < /dev/random | base64
  • This is the most concise command with much better output! Even after fixing the errors, the other commands were giving me lots of question mark characters. – derpedy-doo Aug 26 '22 at 18:46
2

Your locale's character encoding (which you can tell with locale charmap) is a multi-byte per character one.

The most common nowadays is UTF-8 where characters can be encoded over 1 to 4 bytes. Not all sequences of bytes form valid characters in UTF-8. Every non-ASCII character in UTF-8 start with one byte that has the two highest bits set and tell how many bytes with highest (but not second-highest) bit set follow.

/dev/urandom contains a random stream of bytes. tr transliterates character, so it needs to decode those bytes as characters. Those ASCII characters in your range are all encoded on one character in UTF-8, but tr still needs to decode all the characters. There are for instance other multi-byte encodings where some characters other than A contain the 0x41 byte (the code for A).

Because that random stream of bytes is bound to contain invalid sequences (for instance a 0x80 byte by itself is invalid in UTF-8 as a non-ASCII character has to start with a byte greater that 0xc1 (0xc0 and 0xc1 are in no UTF-8 character)), so tr returns with an error when that happens.

What you want here is consider that stream of bytes as characters in an encoding that has one byte per character. Whichever you choose is not important as all those characters in your range (assuming by A-Z, you meant ABCDEFGHIJKLMNOPQRSTUVWXYZ and not things like Ý, Ê) are part of the portable character set so be encoded the same in all the charsets supported on your system.

For that, you'd set the LC_CTYPE localisation variable which is the one that decides which charset is used and what things like blank, alpha character classes contain. But for the definition of the A-Z range, you'll also want to set the LC_COLLATE variable (the one that decides of string ordering).

The C aka POSIX locale is one that guarantees characters are single-bytes and A-Z is ABCDEFGHIJKLMNOPQRSTUVWXYZ. You could do:

 LC_CTYPE=C LC_COLLATE=C tr -dc 'A-Za-z0-9_!@#$%^&*()+=-'

(here moving the - to the end, otherwise, )-+ would be take as a range like A-Z)

But note that the LC_ALL variable overrides all the other LC_* and LANG variables. So, if LC_ALL is otherwise already defined, the above will have no effect. So instead you can simply do:

 LC_ALL=C tr -dc 'A-Za-z0-9_!@#$%^&*()+=-'

That will affect other things like the language of the error messages, but anyway, changing LC_CTYPE could already have been a problem for error messages (for instance, no way to express Russian or Japanese error messages in the charset of the C locale).

0

According to the man page, /dev/random is probably going to be sufficient for your needs. Perhaps Apple ceased to create the /dev/urandom because it is unnecessary?

jsbillings
  • 24,406
  • I do not have /dev/random either. – Kirk Woll Aug 13 '12 at 18:23
  • MacOSX should have both /dev/random and /dev/urandom. Perhaps Apple no longer includes those special files anymore? Or maybe it's only there if you install XCode? – jsbillings Aug 13 '12 at 18:36
  • 1
    FWIW, both devices are present on my Lion-upgraded-to-Mountain Lion workstation. I believe it was present on Lion, as well. Nodes are different as well (13,0 vs. 13,1) – mrb Aug 13 '12 at 18:41