4

Bash regular expression fails to compare correctly for $'\x01' when End-Of-String char $ is used. All other byte values (seem to) compare correctly.

Using GNU bash 4.1.5(1). Is this a bug, or is there another way to represent bytes in hex notation, other than $'\...'? ...But it doesn't seem to be the notation, because even a literal char to literal char comparison fails.

This 'fail' only happens when the $'\x01' immediately precedes the End-Of-String $.

Here are some examples:

echo 'non \x01 with ^ and $'
[[      3  =~ ^$'\x33'$ ]]; echo $?  # 0 
[[      3  =~ ^$'\063'$ ]]; echo $?  # 0 
[[ $'\x12' =~ ^$'\x12'$ ]]; echo $?  # 0 
[[ $'\002' =~ ^$'\x02'$ ]]; echo $?  # 0 

echo '\x01 with no ^ or $'
[[ $'\x01' =~  $'\x01'  ]]; echo $?  # 0 
[[ $'\x01' =~  $'\001'  ]]; echo $?  # 0 
[[       =~  $'\001'  ]]; echo $?  # 0   nb. Literal char does not render
[[       =~         ]]; echo $?  # 0   nb. Literal char does not render

echo '\x01 with ^ only'
[[ $'\x01' =~ ^$'\x01'  ]]; echo $?  # 0 
[[ $'\x01' =~ ^$'\001'  ]]; echo $?  # 0 
[[       =~ ^$'\001'  ]]; echo $?  # 0   nb. Literal char does not render
[[       =~ ^       ]]; echo $?  # 0   nb. Literal char does not render

echo '\x01 with ^ and $'
[[ $'\x01' =~ ^$'\x01'$ ]]; echo $?  # 1 
[[ $'\x01' =~ ^$'\001'$ ]]; echo $?  # 1 
[[       =~ ^$'\001'$ ]]; echo $?  # 1   nb. Literal char does not render
[[       =~ ^$      ]]; echo $?  # 1   nb. Literal char does not render

echo '\x01 with $ only'
[[ $'\x01' =~  $'\x01'$ ]]; echo $?  # 1 
[[ $'\x01' =~  $'\001'$ ]]; echo $?  # 1 
[[       =~  $'\001'$ ]]; echo $?  # 1   nb. Literal char does not render
[[       =~  $      ]]; echo $?  # 1   nb. Literal char does not render

echo '\x01 with $ only, but not adjacent to \x01'
[[ $'\x01'c =~  $'\x01'c$ ]]; echo $?  # 0 
[[ $'\x01'c =~  $'\001'c$ ]]; echo $?  # 0 
[[      c =~  $'\001'c$ ]]; echo $?  # 0   nb. Literal char does not render
[[      c =~  c$      ]]; echo $?  # 0   nb. Literal char does not render
Peter.O
  • 32,916
  • Can you run the code through cat -v? It apparently uses the literal character ^A which disappears on StackOverflow. When I replaced the empty places with ^A, I got 0's everywhere (4.3.33(1)). – choroba Apr 08 '15 at 16:35
  • It works for me in GNU bash, version 4.3.30(1) – cuonglm Apr 08 '15 at 16:35
  • man ascii calls \001 start of heading - I wonder if that is relevant? Like - maybe bash is interpreting \001\000 to mean no heading or something? I dunno how bash encodes that stuff, but also there is a very interesting discussion on the gmane austin group lists about how different locales affect sort and collation orders - it's just that I can't pull it up because gmane has been pretty broken lately. – mikeserv Apr 08 '15 at 16:38
  • 1
    Oh, here is a question which has an answer that links to the discussion, actually, and in which Stéphane Chazelas sums it up quite nicely anyway. Not sure if it is relevant to this or not (maybe not - \001 should never be a multibyte char) - but it is about the general subject anyway. – mikeserv Apr 08 '15 at 16:43
  • 1
    I agree it is strange and looks like a bug to me. Same behaviour with bash 4.1.17(9), but it works ok with bash 4.3.11(1). Can you update your bash version? – gogoud Apr 08 '15 at 16:36
  • 2
    Here is a similar bug was reported. – cuonglm Apr 08 '15 at 17:10
  • @gogoud Thanks for the comparison of two versions of bash. It is sounding like a bug - time for a bash update, and time to start using $'\x02' instead of my old favourite temp char $'\x01' – Peter.O Apr 08 '15 at 17:27

1 Answers1

2

Yes, it was a bug in old versions of bash fixed in bash-4.2.14

And here is the commit which makes the problem go away; make of it what you will.

What is CTLESC? It is defined in syntax.h as #define CTLESC '\001', you see. It's some sort of internal escape somehow involved in expansion. It looks like the bug may be that your \x01 datum is being interpreted as if it were an internally-generated CTLESC or something like that.

commit 25db9a70d4c2ba5c43d4167f231bdd8d760d5a06
Author: Chet Ramey <chet.ramey@case.edu>
Date:   Tue Nov 22 20:02:46 2011 -0500

    Bash-4.2 patch 14

diff --git a/patchlevel.h b/patchlevel.h
index 636be1c..04b423b 100644
--- a/patchlevel.h
+++ b/patchlevel.h
@@ -25,6 +25,6 @@
    regexp `^#define[   ]*PATCHLEVEL', since that's what support/mkversion.sh
    looks for to find the patch level (for the sccs version string). */

-#define PATCHLEVEL 13
+#define PATCHLEVEL 14

 #endif /* _PATCHLEVEL_H_ */
diff --git a/pathexp.c b/pathexp.c
index 42f21e4..f239956 100644
--- a/pathexp.c
+++ b/pathexp.c
@@ -196,7 +196,7 @@ quote_string_for_globbing (pathname, qflags)
    {
      if ((qflags & QGLOB_FILENAME) && pathname[i+1] == '/')
        continue;
-     if ((qflags & QGLOB_REGEXP) && ere_char (pathname[i+1]) == 0)
+     if (pathname[i+1] != CTLESC && (qflags & QGLOB_REGEXP) && ere_char (pathname[i+1]) == 0)
        continue;
      temp[j++] = '\\';
      i++;
Kaz
  • 8,273