$ echo ABC | awk '$0 ~ /^[a-b]/'
ABC
$ echo ABC | awk '$0 ~ /^[a-a]/'
$ echo ABC | awk '$0 ~ /^a/'
$
You see. /[a-b]/
captures A
, but /[a-a]/
or /a/
doesn't. Why?
$ echo ABC | awk '$0 ~ /^[a-b]/'
ABC
$ echo ABC | awk '$0 ~ /^[a-a]/'
$ echo ABC | awk '$0 ~ /^a/'
$
You see. /[a-b]/
captures A
, but /[a-a]/
or /a/
doesn't. Why?
It is a "locale" problem, I think.
In my locale, it_IT, the following snippet
if [[ a < A ]]; then
echo "a < A"
elif [[ a > A ]]; then
echo "a > A"
else
echo "a = A"
fi
if [[ b < A ]]; then
echo "b < A"
elif [[ b > A ]]; then
echo "b > A"
else
echo "b = A"
fi
shows
a < A
b > A
so that A
is (surprisingly) between a
and b
, so in the range.
Try executing
echo ABC | LC_COLLATE=C awk '$0 ~ /^[a-b]/'
Edit
the following command shows the collating order in your locale:
echo $(LC_COLLATE=C printf '%s\n' {A..z} | sort)
the output on my machine is
` ^ _ [ ] a A b B c C d D e E f F g G h H i I j J k K l L m M n N o O p P q Q r R s S t T u U v V w W x X y Y z Z
(cannot understand from bash's manual page if sequence expressions are expanded in locale collating order or not; it seems not).
sort
, join
or the like, I start my scripts with export LC_COLLATE=C
. Now I have to start this way also scripts using awk
:)
– enzotib
Aug 24 '11 at 18:44
LC_COLLATE=C
with your printf
command in the edit?
– rozcietrzewiacz
Oct 24 '11 at 06:23
printf
interpret the sequence {A..z}
in a way independent of the particular locale (as the sentence following explains in some way: "cannot understand from bash's manual page if sequence expressions are expanded in locale collating order or not; it seems not".
– enzotib
Oct 24 '11 at 07:46
LC_ALL
was set in the environment, then changing LC_COLLATE
alone would have no effect.
– rozcietrzewiacz
Oct 24 '11 at 09:03