0

Command:

echo "HelloWorld==" | base64 -d | base64

Output:

HelloWorlQ==

Why is my d now a Q?

Edit:

I am not trying to start with arbitrary data and base64 encode it. My intention is to start with Base64 and end with Base64, having only produced a binary value in the interim.

Edit 2:

I have noticed that it does not happen if the input string has is a multiple of four characters, so I think it is an interaction with the padding somehow:

❯ echo 'abcdefghij==' | base64 -d | base64
abcdefghig==

❯ echo 'abcdefgh' | base64 -d | base64 
abcdefgh

Edit 3

Removed confusing mention of the -i flag, which turned out to have nothing to do with my problem.

  • Related? https://unix.stackexchange.com/a/383711/117549 – Jeff Schaller Aug 25 '19 at 20:58
  • Thanks @JeffSchaller but I don't think so. Newline is not a valid base64 character, so it should be ignored. Also using echo -n doesn't change anything. – MatrixManAtYrService Aug 25 '19 at 21:02
  • 1
    please change your example so that it provides valid base64 input to base64 -id. e.g. echo SGVsbG8gV29ybGQhPT0K | base64 -id | base64. or, at least, the actual data you're piping. Piping Hello World!== into it just makes the question confusing. BTW, this also answers the question - whatever your actual input is, it is not valid base64. – cas Aug 26 '19 at 02:41
  • 1
    @cas I have removed references to, and usage of the -i flag, because I agree it was confusing. I'm leaving the invalid base64 because if it were suddenly valid then the question wouldn't make sense. – MatrixManAtYrService Aug 27 '19 at 01:51

3 Answers3

5

HelloWorld== contains information which cannot be decoded, and isn't technically valid Base64 since they should in general be 0 padded. The extra 1s this contains will be ignored and lost when you echo "HelloWorld==" | base64 -d.

To explain...

Base64 Works with groups of 4 characters. Each character represents 6 bits, so each group of 4 decodes into 3 bytes (8 bits per byte). The only exception is the last 4 characters which will vary depending on the number of = signs. Base64 strings will always be divisible by 4.

  • 0 decodes to 3 bytes
  • 1 = decodes to 2 bytes
  • 2 == decodes to 1 byte

In your example Hell and oWor are both valid. But ld== isn't. To understand why see this lookup table: https://en.wikipedia.org/wiki/Base64

ld== should decode to only one byte because it has two = at the end. But ld decodes to: 100101 011101. A byte is only eight 8 bits. So when you decode your string with base64 -d, only 100101 01 will be converted into a byte and the end 1101 will be ignored completely.

Any base 64 string ending in == must only use the first two bits of the last character. That is the only valid endings with == are Q== A== w== g==

4

Yes, it's an interaction with the padding.

Let's look at your actual encoded data by decoding it, and (since it's not an ASCII string) convert it into binary:

$ base64 -d <<<'HelloWorld==' | xxd -b
00000000: 00011101 11101001 01100101 10100001 01101010 00101011  ..e.j+
00000006: 10010101                                               .

This is the data that HelloWorld== is the base64 encoding of. Philip Couling explains the intricacies of the decoding of the final ld== part and that, in a way, only a third of the data encoded by the d is actually used when decoding the data. Below I'm showing where the Q comes from when you re-encode the data.

Let's repeat that binary:

00011101 11101001 01100101 10100001 01101010 00101011 10010101

In groups of six bits (which is what the base64 encoder will use):

000111 011110 100101 100101 101000 010110 101000 101011 100101 01

This padded with four zero bits at the end to make 10 complete 6-bit codes:

000111 011110 100101 100101 101000 010110 101000 101011 100101 010000

The 010000 is the Q you see when you re-encode the data (see the base64 table of codes).

Kusalananda
  • 333,661
2

The piping is not in order. You should encode first, before you decode.

$ echo "Hello World!==" | base64 | base64 -id
Hello World!==

You were decoding an invalid base64 encode format.

Kusalananda
  • 333,661
  • Is HelloWorld== not a valid base64 encoded string? Shouldn't I be able to convert it to binary and back to base64 without losing anything? – MatrixManAtYrService Aug 25 '19 at 21:46
  • I thought the same until I read the whole question: My intention is to start with Base64 and end with Base64 (unfortunately this may have been added after your answer). – ctrl-alt-delor Aug 25 '19 at 22:13
  • 1
    It was added later--after I realized (thanks to this answer) that it looked like a simple transposition error on my part. – MatrixManAtYrService Aug 25 '19 at 22:23