when we input the credentials in the login prompt, the hash of the password is computed and then that hash is compared with the hash stored somewhere, and if I am not mistaken that "somewhere" is the /etc/shadow
.
This is broadly correct (and more correct than many simplified explanations on the web, so well done). More precisely, the data flow is:
- Read some configuration files. I'll assume they result in the common case of a local account with the normal defaults. Other possibilities at this stage can result in querying a network account database, or doing non-password-based authentication, or (but this would be unusual) doing password-based authentication with a different location of the password hash.
- Input the user's password P. (This may happen after the next step.)
- Find the line with the desired user name in
/etc/shadow
and extract the “encrypted¹ password” field. Note that this step requires permission to read /etc/shadow
.
- Split the shadow field into two parts²: the configuration+salt S and the expected hash H. Pass P and S to the
crypt
library function¹, obtaining a result R. Compare R with H: if they're equal then the password authentication has succeeded, otherwise it's failed.
- Continue with the user authentication if there are non-password based methods.
- If the authentication has succeeded, log the user in, otherwise error out.
On a typical embedded Linux system, all these steps happen inside the same program: login
for a console login, su
or sudo
when elevating privileges, dropbear
for SSH logins, etc. Step 1 may be completely omitted since embedded systems often don't have any runtime configurability at this point. The implementation of the crypt
function comes from the system's standard library, e.g. musl. So the code to perform the calculation is stored in something like /lib/libc.so
while the code that performs the surrounding configuration and database lookups is in /bin/login
and such. The kernel is not involved in authentication except for providing basic input-output primitives (open file, read file, etc.). The kernel only gets involved more directly after authentication, to keep track of the privileges of the process after the authentication. (It may also be involved for temporary privilege escalation to read /etc/shadow
if the authentication program doesn't run as root all the time.)
On a typical non-embedded Unix system, most of this process is subcontracted to the PAM library. PAM consists of a main library (/lib/libpam.so.0
— here and elsewhere the exact path is system-dependent) as well as a number of auxiliary libraries and programs. I won't get into details because they're all part of the same software suite. The authenticating program calls a series of functions in the PAM library to authenticate the user as well as decide what to do after a successful authentication (session establishment). I think pam_authenticate
is the function that performs parts of step 1 as well as steps 3–5 (I'm not sure as I'm not familiar with this side of PAM).
With PAM, steps 3–4 (finding the password hash and validating the password against it) are specifically handled in /lib/security/pam_unix.so
, the PAM module for traditional password-based authentication. The pam_unix
module runs in the context of the process that performs the authentication, which might not run with enough privileges to read /etc/shadow
(but needs to run with enough privileges to be able to get those privileges³). To minimize the amount of code that can access the password hashes, this part runs in a dedicated program unix_chkpwd
. This program is the one that reads the password hash from /etc/shadow
, calls the crypt
function and verifies its output.
¹ A misleading name since the password is hashed, not encrypted — you can't “decrypt” the content to find the original password.
² The interface is slightly weird for historical reasons. The field in /etc/shadow
contains 4 parts: an algorithm identifier, some cost parameters, a salt string, and the expected output string. The algorithm identifier selects which password hashing algorithm is used — note that these are not hashing algorithms despite what the name suggests. See e.g. Do all Linux distributions use the same cryptographic hash function? for more information. The cost parameters depend on the algorithm; a higher cost makes normal authentication slower but also makes cracking the password harder if an attacker manages to retrieve the hash. The salt is unique and protects against multi-account attacks (e.g. trying to get into the account of the employee with the weakest password, to get a foothold into an organization). Internally, the crypt
function uses the algorithm identifier to determine which auxiliary function to call, and on some systems this auxiliary function can live in a different library.
³ Typically, the program (login
, su
, …) starts with root privileges, and keeps (at least a part of itself) root as its saved user ID but changes its effective user ID to the a dedicated system user (when logging in) or to the invoking user (when elevating privileges). This minimizes the risks of a security hole in the login program that allows an attacker to get partial control of the login program (e.g. read files) but not the ability to execute arbitrary code. Elevating privileges requires calling a dedicated system call such as seteuid
.
sshd
, that's responsible for PAM support, not the client; but presuming the daemon is what you mean...) – Charles Duffy Sep 02 '23 at 19:46sshd
is a special case 'cos it can optionally use PAM or not (theusePAM
entry insshd_config
). If you do anls /etc/pam.d
you'll see lots of entries (egsu
,sudo
,screen
,passwd
,login
and more). These are the PAM configs for those programs. – Stephen Harris Sep 02 '23 at 20:38% ldd /bin/login | grep libpam libpam.so.0 => /lib64/libpam.so.0 (0x00007f87be58e000) libpam_misc.so.0 => /lib64/libpam_misc.so.0 (0x00007f87be38a000)
– Stephen Harris Sep 03 '23 at 12:24