2

I feel like this is so obvious that searching on the Internet doesn't show any results about my problem. I'm looking at the root password in /etc/shadow, which looks something like:

$6$Etg2ExUZ$F9NTP7omafhKIlqaBMqng1.....

The password is what's after the second $ sign. It uses sha512 hashing but it's not displayed as hex. My question is how does the output of a sha512 hash which is usually in hex get converted to this string:

F9NTP7omafhKIlqaBMqng1....

with non hex characters?

slm
  • 369,824
Andrés
  • 31
  • Overview of /etc/shadow contents + passwords - https://www.slashroot.in/how-are-passwords-stored-linux-understanding-hashing-shadow-utils - shows how to do it manually - https://unix.stackexchange.com/questions/81240/manually-generate-password-for-etc-shadow. – slm Aug 19 '18 at 03:04
  • Man page on crypt() - https://en.wikipedia.org/wiki/Crypt_(C). – slm Aug 19 '18 at 03:11
  • Ulrich Drepper's original proposal for adding SHA256/512 - https://www.akkadia.org/drepper/SHA-crypt.txt – slm Aug 19 '18 at 03:13

2 Answers2

3

The output of sha512 (or any hash) is a sequence of bits – 512 of them, in this case – which encode a (very large) number, and can be separated into bytes or whatever other divisions are desired.

Hexadecimal representations turn each four-bit chunk into a hexadecimal digit and are a common way of representing hashes to a user, but not inherent to the hash function itself. It could just as well be raw bytes, or decimal, or a long string of 0s and 1s. What's important is that the writer and reader agree on how the number's encoded.

In this case, raw bytes won't work, because the shadow file uses newlines to separate records, so that byte isn't available, and it would be hard to read or copy for a person if it ended up with other odd characters in it. That's why ASCII encodings are generally used for hashes that you might see.

ASCII-represented hex, decimal, and binary are all fairly inefficient mechanisms, however: hex doubles the size of the raw bytes (4 bits of input to one ASCII byte of output). For shorter hashes that's less of an issue: MD5 is only 128 bits, 16 bytes, so 32 hexadecimal digits, and that's manageable, but for longer hashes it gets unwieldy fast. For a 512-bit hash like this, though, that'd be 128 bytes just for the hash. Decimal or binary would be even worse, though it probably isn't hugely important how long these are in this case as long as everyone agrees.


For this specific case, man 3 crypt says that:

The characters in "salt" and "encrypted" are drawn from the set [a-zA-Z0-9./].

a-z (26), A-Z (26), 0-9 (10), . (1), and / (1) make 26+26+10+1+1=64 available characters in total, so a base-64 representation sounds like it's in use. That means each ASCII byte represents 6 bits (2^6 = 64) of the data: four bytes (32 bits) of base64 holds three bytes (24 bits) of the original data, so it's only 33% expanded on where it started. A 512-bit value needs 86 bytes to store in this encoding.

Base64 is a good default when a) you need to store or transmit arbitrary binary data within ASCII and b) nobody will ever have to read it out loud. Both of those hold here, so it's a sensible choice. Hexadecimal representations are convenient when you might have to read or check the hash manually, because case is unimportant and there aren't that many distinct values. There is also a little-used, but standard, base32 encoding that sits in the middle (all upper case and digits), but there's not much reason to use it here.


You probably have a base64 tool installed which will do these conversions for you in both directions. It may use different bytes at the end than crypt does – the MIME base64 encoding uses + and / instead of . and /, for example – but you can see how it turns arbitrary input into slightly-longer ASCII-encoded output. There are also online tools to encode and decode, but for a password hash you're likely to get unprintable bytes and invalid byte sequences, so it may not be much help there.

Michael Homer
  • 76,565
  • No, base64 has UPPER before lower, so it is [A-Za-z0-9+/=] (and yes it uses 65 characters, not 64). Your quote from man 3 crypt is correct and it has lower before UPPER. Also note that there are many base64 variations. –  Aug 19 '18 at 02:39
  • 1
    Sets don't have an order. – Michael Homer Aug 19 '18 at 02:41
  • Really? That a byte value 0 is converted to a letter A instead of a letter a has much to do with the order of base64 conversion characters. That you choose to call that a set does not make it a Mathematical set without order. –  Aug 19 '18 at 02:47
  • 1
    It's a direct quote, Isaac, I didn't choose anything. – Michael Homer Aug 19 '18 at 02:49
  • The point is that you claim than any base 64 encoding tool will decode the password, that is not true. The string is not base64 encoded, not in any of its variations. –  Aug 19 '18 at 03:09
  • @Isaac - https://www.akkadia.org/drepper/SHA-crypt.txt. 22e explains it. – slm Aug 19 '18 at 03:21
  • 1
    I did not claim that; in fact, I noted that it wouldn't work for these, for multiple reasons. I am perfectly comfortable with the level of abstraction in this answer in relation to a question that presumes hexadecimal encoding is fundamental to SHA512 and is mystified by the appearance of other characters, and I don't intend to itemise the steps of the encoding algorithm in it. It's available to find for anyone interested once they know what they're looking at. – Michael Homer Aug 19 '18 at 03:23
  • @slm Thanks for the link. The linked page states in point 22 e that the string to convert the hash is ./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz, in that order, with a dot representing a zero byte value. Even far away than what I remembered from the RFC base64 algorithm. –  Aug 19 '18 at 03:39
0

The output of a hash is a series of bytes not letters. As some byte values have special meaning to most text programs, the bytes are usually encoded in some way. A common hash encoding is hex, one letter per nibble (half a byte). Others are also possible (and also common). The one used in the shadow file is similar to base 64 but uses a conversion string of ./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz every 6 bits from the hash output become a letter from that list.

A 512 bits hash (64 bytes) will use 86 letters (fixed length) in the base64-like encoding instead of the longer 128 letters for hex encoding.

There is a direct C function to perform crypt encodings. You do not need to decode it, just compare the output of the following program to the actual value inside /etc/shadow.

File: passwd-sha512.c

#define _XOPEN_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
  if ( argc < 3 || (int) strlen(argv[2]) > 16 ) {
    printf("usage: %s password salt\n", argv[0]);
    printf("--salt must not larger than 16 characters\n");
    return;
  }

  char salt[21];
  sprintf(salt, "$6$%s$", argv[2]);

  printf("%s\n", crypt((char*) argv[1], (char*) salt));
  return;
}

to compile:

$ /usr/bin/gcc -lcrypt -o passwd-sha512 passwd-sha512.c

usage:

$ passwd-sha512 <password> <salt (16 chars max)>

From your example string $6$Etg2ExUZ$F9NTP7omafhKIlqaBMqng1.....:

$ ./passwd-sha512 password Etg2ExUZ
$6$Etg2ExUZ$k01JYPOzptT0enZUP........

Remember that passwords in the command line could be read by other users, adapt the C code to use stdin or some other secure method.

slm
  • 369,824