That would be because the [...]
matches on a character. sed
would try and match characters against the range specified in [...]
. In UTF-8 locales, you can only encounter \x8f
as part of a multi-byte character. You'll notice that .
doesn't match on it either (and that's a POSIX requirement).
For instance:
sed 's/[eé\xa9]//'
would not make sense. é
is a character (encoded as 0xc3 0xa9
), 0xa9 is not a character but as a byte, can be found inside a character (like é
), e
is a character (encoded as 0x65). You can't expect sed
to somehow be able to match 0xa9 both inside a character and as a byte.
To match arbitrary byte data with a text utility like sed
, you'll want to use a locale where characters are bytes, that's a typical case for LC_ALL=C
.
LC_ALL=C sed 's/12[\x8f\x9f]//g'
Or portably:
LC_ALL=C sed "$(printf 's/12[\217\237]//g')"
Note that you can't expect to process data containing NUL characters (or that don't end in a newline character or where newline characters are more than a few kilobytes appart) portably with sed
. Use perl -p/-n
instead in that case.
echo -e "a12\x8fb12\x9f" | sed -e 's_12[\x8f]__g' | xxd -ps | sed 's/../\0 /g' | sed -r 's/31 32 (8f|9f) ?//g' | xxd -r -ps | xxd
– LatinSuD Jun 26 '14 at 13:08LC_ALL=C sed...
,\x8f
by itself won't make a character in a UTF-8 locale. – Stéphane Chazelas Jun 26 '14 at 13:14