3

I just wanted to write a script and needed to extract the size of directories. There I discovered a strange effect which I don't understand:

I used the "du" command:

> x=$(du mydir)
> echo $x
8192 mydir

So far ok. But now I want to extract the size by removing all chars from x beginning with the first whitespace. But then I get

> echo ${x%% *}
8192 mydir

instead of just 8192.

So checked that with another variable, not generated by "du"

> y="8192 mydir"
> echo ${y%% *}
8192

Why does this work for y but not for x? I also checked that x and y are identical strings.

I'm really puzzled here. I will be very pleased if anybody has an answer for that?

Humpri
  • 39

2 Answers2

5

If you double-quote your variable you'll see that there's visibly more than one space. In actual fact it's a tab character, which you can see on some systems more clearly with

echo "$x" | cat -e

You can therefore match the tab (or space) with this construct, which removes the longest sequence of "space-or-tab followed by anything" that's bound to the end of the string value in $x

echo "${x%%[[:blank:]]*}"
Chris Davies
  • 116,213
  • 16
  • 160
  • 287
2

xxd or hd are good tools for investigating such things. Example:

$ du sample
32  sample

$ du sample | xxd -g 1 00000000: 33 32 09 73 61 6d 70 6c 65 0a 32.sample.

If your hex notation is rusty you can use ascii:

$ ascii -x
   00 NUL    10 DLE    20      30 0    40 @    50 P    60 `    70 p 
   01 SOH    11 DC1    21 !    31 1    41 A    51 Q    61 a    71 q 
   02 STX    12 DC2    22 "    32 2    42 B    52 R    62 b    72 r 
   03 ETX    13 DC3    23 #    33 3    43 C    53 S    63 c    73 s 
   04 EOT    14 DC4    24 $    34 4    44 D    54 T    64 d    74 t 
   05 ENQ    15 NAK    25 %    35 5    45 E    55 U    65 e    75 u 
   06 ACK    16 SYN    26 &    36 6    46 F    56 V    66 f    76 v 
   07 BEL    17 ETB    27 '    37 7    47 G    57 W    67 g    77 w 
   08 BS     18 CAN    28 (    38 8    48 H    58 X    68 h    78 x 
   09 HT     19 EM     29 )    39 9    49 I    59 Y    69 i    79 y 
   0A LF     1A SUB    2A *    3A :    4A J    5A Z    6A j    7A z 
   0B VT     1B ESC    2B +    3B ;    4B K    5B [    6B k    7B { 
   0C FF     1C FS     2C ,    3C <    4C L    5C \    6C l    7C | 
   0D CR     1D GS     2D -    3D =    4D M    5D ]    6D m    7D } 
   0E SO     1E RS     2E .    3E >    4E N    5E ^    6E n    7E ~ 

From this you quickly find:

  • 33 => 3
  • 32 => 2
  • 09 => HT, or Horizontal Tab
  • 73 => s
  • ...

In other words. 32<HT>sample. Space is 0x20 and you would have gotten 33 32 20 73 ... if that was the case.

Another tool is od, here using the -c option, (-a or -ac can also be nice):

$ printf %s "$x" | od -c
0000000   3   2  \t   s   a   m   p   l   e

Which also gives you the actual contents – when you quote correctly.

Solution is to use @roaimas example or something in the line of

$ read -r size name<<< $(du sample)
$ echo "$size"
32

All depending on what you want awk is also likely a good candidate.

ibuprofen
  • 2,890