Your locale's character encoding (which you can tell with locale charmap
) is a multi-byte per character one.
The most common nowadays is UTF-8 where characters can be encoded over 1 to 4 bytes. Not all sequences of bytes form valid characters in UTF-8. Every non-ASCII character in UTF-8 start with one byte that has the two highest bits set and tell how many bytes with highest (but not second-highest) bit set follow.
/dev/urandom
contains a random stream of bytes. tr
transliterates character, so it needs to decode those bytes as characters. Those ASCII characters in your range are all encoded on one character in UTF-8, but tr
still needs to decode all the characters. There are for instance other multi-byte encodings where some characters other than A
contain the 0x41 byte (the code for A
).
Because that random stream of bytes is bound to contain invalid sequences (for instance a 0x80 byte by itself is invalid in UTF-8 as a non-ASCII character has to start with a byte greater that 0xc1 (0xc0 and 0xc1 are in no UTF-8 character)), so tr
returns with an error when that happens.
What you want here is consider that stream of bytes as characters in an encoding that has one byte per character. Whichever you choose is not important as all those characters in your range (assuming by A-Z, you meant ABCDEFGHIJKLMNOPQRSTUVWXYZ and not things like Ý
, Ê
) are part of the portable character set so be encoded the same in all the charsets supported on your system.
For that, you'd set the LC_CTYPE
localisation variable which is the one that decides which charset is used and what things like blank
, alpha
character classes contain. But for the definition of the A-Z range, you'll also want to set the LC_COLLATE
variable (the one that decides of string ordering).
The C
aka POSIX
locale is one that guarantees characters are single-bytes and A-Z is ABCDEFGHIJKLMNOPQRSTUVWXYZ. You could do:
LC_CTYPE=C LC_COLLATE=C tr -dc 'A-Za-z0-9_!@#$%^&*()+=-'
(here moving the -
to the end, otherwise, )-+
would be take as a range like A-Z
)
But note that the LC_ALL
variable overrides all the other LC_*
and LANG
variables. So, if LC_ALL
is otherwise already defined, the above will have no effect. So instead you can simply do:
LC_ALL=C tr -dc 'A-Za-z0-9_!@#$%^&*()+=-'
That will affect other things like the language of the error messages, but anyway, changing LC_CTYPE could already have been a problem for error messages (for instance, no way to express Russian or Japanese error messages in the charset of the C locale).
xargs
... – sendmoreinfo May 16 '13 at 19:41