41

When adding a new user, how is the string validated?

I suppose there is a regular expression. What is that regular expression?

3 Answers3

39

Sorry for necrobumping this almost 4-year-old question, but it comes up pretty high on Internet search results and it warrants a little more attention.

A more accurate regex is (yes, I know, despite the man page):

^[a-z_]([a-z0-9_-]{0,31}|[a-z0-9_-]{0,30}\$)$

Hopefully that helps some of those searching.

To break it down:

  1. It should start (^) with only a lowercase letter or an underscore ([a-z_]). This occupies exactly 1 character.

  2. Then it should be one of either (( ... )):

    1. From 0 to 31 characters ({0,31}) of letters, numbers, underscores, and/or hyphens ([a-z0-9_-]),

    OR (|)

    1. From 0 to 30 characters of the above plus a US dollar sign symbol (\$) at the end,

    and then

  3. No more characters past this pattern ($).

For those unfamiliar with regular expressions, you may ask why the dollar sign had a backslash in 2.2, but did not in 3. This is because in most (all?) regex variants, the dollar sign indicates the end of a string (or line, etc.). Depending on the engine being used, it will need to be escaped if it's part of the actual string (I can't think off the top of my head of a regex engine that doesn't use backslash as an escape).

Note that Debian and Ubuntu remove some restrictions for a fully POSIX/shadow upstream compliant username (for instance, and I don't know if this has been fixed, but they allow the username to start with a number – which actually is what caused this bug). If you want to guarantee cross-platform, I'd recommend the above regex rather than what passes/fails the check in Debian, Ubuntu, and others.

  • Great answer. Can easily be applied also in Java using java.util.regex.Pattern.matches("^[a-z_]([a-z0-9_-]{0,31}|[a-z0-9_-]{0,30}\\$)$", user); – dokaspar May 25 '18 at 06:31
  • 1
    It should be [abcdefghijklmnopqrstuvwxyz] instead of [a-z]. [a-z] in many regexp engines also matches things like é, œ or even sometimes multi-character collating elements like dsz in Hungarian locales. – Stéphane Chazelas Jul 23 '18 at 06:54
  • 1
    Linux usernames do not accept Unicode (unless they are explicitly configured to break POSIX compliance - 1 2). This check should be done outside of the regex, as it's an input/environment/localization validation, not a string validation. Further, I'd love to hear an example of a regex engine that does this. All ones I know of match on ASCII and one has to explicitly enable Unicode, if it's even supported. – brent saner Jul 24 '18 at 17:44
  • Don’t apologize for adding value to a thread, even if it is after years of inactivity. But I’m struggling to see just how much value you have added. It seems to me that that your regex is equivalent to the one in Malte Skoruppa’s answer (i.e., in the man page) except yours incorporates the length requirement. One might argue that your version is more compact and efficient (if only in terms of total space); others might say that your version commingles … (Cont’d) – G-Man Says 'Reinstate Monica' Apr 21 '21 at 22:12
  • (Cont’d) …  the pattern requirement and the length requirement, which could (should?) more simply be handled separately (i.e., the regex from the man page + a call to strlen()). Note that cuonglm’s answer shows the pattern check and the length check being done in separate routines, which seems more modular — a future programmer can change the #define USER_NAME_MAX_LENGTH and not need to mess with the regex. … … … … … … … … P.S. Did you construct your regex yourself? If yes, good job. If not, you should say where you got it. … (Cont’d) – G-Man Says 'Reinstate Monica' Apr 21 '21 at 22:12
  • (Cont’d) … In your comment responding to Stéphane Chazelas, you say that the check for the presence of Unicode (non-ASCII) characters “should be done outside of the regex”. That’s very similar to my hypothetical argument about the string length — it should be done outside the regex because it's an input validation, not a string validation. ISTM that you are holding mutually contradictory points of view; can you reconcile them? – G-Man Says 'Reinstate Monica' Apr 21 '21 at 22:12
  • @G-ManSays'ReinstateMonica' the length is tied to the allowed char match per POSIX spec. it'd complicate the regex if the length was checked out of the pattern (note the ending dollar sign literal - only one is allowed if 30 chars, otherwise disallowed.)

    yes, constructed the pattern myself- thanks.

    – brent saner Apr 26 '21 at 06:43
  • @G-ManSays'ReinstateMonica' I personally found this regex more useful than the documented one. IMO any talk about "modularity" and "input vs string validation" is moot; this is simple enough that it's perfectly readable and obvious to anyone who knows scripting and regex. If someone is not familiar with scripting/regex, it'll be inscrutable no matter what. He gets an upvote from me. – Alvin Thompson Sep 29 '23 at 15:50
  • @AlvinThompson Bit late, but thanks! :) To clarify some things I realize I never fully addressed (...6 years after making the answer), 1.) the regex pattern I provided matches the POSIX spec for portable usernames, which imposes a characterset limit, and shadow defaults/older UNIX for character length limit (newer shadow allows USER_NAME_MAX_LENGTH overriding). 2.) [A-Za-z] should always match the expected ASCII char set because they're sequential in both ASCII and Unicode -- that is to say, the range sets are 0x41-0x5A(A-Z), 0x61-0x7A (a-z) in ASCII and Unicode both. – brent saner Jan 29 '24 at 14:19
18

From the man page of useradd (8):

It is usually recommended to only use usernames that begin with a lower case letter or an underscore, followed by lower case letters, digits, underscores, or dashes. They can end with a dollar sign. In regular expression terms: [a-z_][a-z0-9_-]*[$]?

On Debian, the only constraints are that usernames must neither start with a dash ('-') nor contain a colon (':') or a whitespace (space: ' ', end of line: '\n', tabulation: '\t', etc.). Note that using a slash ('/') may break the default algorithm for the definition of the user's home directory.

Usernames may only be up to 32 characters long.

So, there's a general recommendation. The actual constraints depend on the specifics of your implementation / distribution. On Debian-based systems, apparently there are no very hard constraints. In fact, I just tried useradd '€' on my Ubuntu box, and it worked. Of course, this may break some applications that do not expect such unusual usernames. To avoid such problems, it is best to follow the general recommendation.

17

The general rule for username is its length must less than 32 characters. It depend on your distribution to make what is valid username.

In Debian, shadow-utils 4.1, there is a is_valid_name function in chkname.c:

static bool is_valid_name (const char *name)
{
    /*
     * User/group names must match [a-z_][a-z0-9_-]*[$]
     */
    if (('\0' == *name) ||
        !((('a' <= *name) && ('z' >= *name)) || ('_' == *name))) {
        return false;
    }

    while ('\0' != *++name) {
        if (!(( ('a' <= *name) && ('z' >= *name) ) ||
              ( ('0' <= *name) && ('9' >= *name) ) ||
              ('_' == *name) ||
              ('-' == *name) ||
              ( ('$' == *name) && ('\0' == *(name + 1)) )
             )) {
            return false;
        }
    }

    return true;
}

And the length of username was checked before:

bool is_valid_user_name (const char *name)
{
    /*
     * User names are limited by whatever utmp can
     * handle.
     */
    if (strlen (name) > USER_NAME_MAX_LENGTH) {
        return false;
    }

    return is_valid_name (name);
}
cuonglm
  • 153,898