0

I'm trying to verify if a subdomain entered by a user is valid, but whatever I pass in, it's never valid. I know the regex is ok, so the problem is my "if" logic, however I'm new to shell/bash

#!/bin/bash
#

echo Enter the subdomain\'s name to configure.
read SUBDOMAIN

if [[ ! $SUBDOMAIN =~ [A-Za-z0-9](?:[A-Za-z0-9-]{0,61}[A-Za-z0-9])? ]]; then
    echo "$SUBDOMAIN is not a valid domain"
fi

Examples:
Would be accepted (regular subdomain names): test
Would not be accepted (invalid subdomain name): -
Would not be accepted (invalid subdomain name): (Empty)
Would not be accepted (invalid subdomain name): #$??&@#&?$##$

I would prefer using shell, but the parentheses in the regex make the script throw an error.

I'm not sure if it can be done with grep, but I never understood how to use grep and it always confused me.

1 Answers1

2

If you're trying to match "alphanumeric" followed by "alphanumeric or dash", ensuring there's not a dash at the end, such that there is a total of 1..62 characters, this RE will work for you

^[[:alnum:]](([[:alnum:]]|-){0,61}[[:alnum:]])?$

This binds to the beginning and end of the string, so the RE must match the string in its entirety.

  • Start of line ^
  • A single alphanumeric, any case [[:alnum:]]
  • An optional block (bracketed (...) and terminated with ?)
    • [[:alnum:]] or a dash -, repeated 0..60 times
    • [[:alnum:]]
  • End of line $

As has been recommended in the comments under this answer, I should point out that the [[:alnum:]] range is affected by the current locale. If you want to ensure that it matches only "ASCII" A-Z, a-z and 0-9 you need to ensure it's running with LANG=C. Otherwise you may find that additional characters are accepted, such as á é ø ß and others.

Chris Davies
  • 116,213
  • 16
  • 160
  • 287
  • Thanks friend! Your regex looks much better! I just have to change the regex a bit so subdomains can't end with a dash as well and It's all good :) – NaturalBornCamper Apr 30 '18 at 16:21
  • @NaturalBornCamper that's actually a little more complicated than it sounds – Chris Davies Apr 30 '18 at 16:23
  • Nope, what you gave me got me started, I just changed your answer a bit and it's working: if [[ ! $SUBDOMAIN =~ ^[:alnum:]{0,61}[[:alnum:]]$ ]]; – NaturalBornCamper Apr 30 '18 at 16:26
  • @NaturalBornCamper that will fail with a single character entry. It will also accept a 63 character string. Please see the amended answer for my suggestion. – Chris Davies Apr 30 '18 at 16:27
  • Oh wow you're right, I totally missed that, thanks heaps! – NaturalBornCamper Apr 30 '18 at 16:32
  • A subdomain like aábé will be accepted in a default utf8 locale. –  May 01 '18 at 04:13
  • @Isaac that's good. IDNs are permitted these days. – Chris Davies May 01 '18 at 06:13
  • @roaima From RFC 5890 4.6. Legacy IDN Label Strings The URI Standard [RFC3986] and a number of application specifications (e.g., SMTP [RFC5321] and HTTP [RFC2616]) do not permit non-ASCII labels in DNS names used with those protocols, i.e., only the A-label form of IDNs is permitted in those contexts. It sounds reasonable to limit to ASCII labels (even those IDN punycode strings that expand to Unicode characters). Or, at least, for web pages (HTTP) name addresses (more than 95% of internet on present days). –  May 01 '18 at 07:17
  • @isaac it's probably right to mention that explicitly, but as we don't know the OP's application I don't believe we should assume too much about the intended use. – Chris Davies May 01 '18 at 07:31
  • 1
    @roaima Since you are writing an answer about what you do know it follows that it is reasonable that you should make a note about the a-z ranges matching many UNICODE characters and not leave that hidden. –  May 01 '18 at 07:40