I was looking today at validating the commit 146806 to the hosting_site.module from anarcat. And so, I decided to take a look at the specification and come up with my own version of the pattern to see how it compared to anarcat's.
Here's the breakdown of my interpretation:
<digit> ::= any one of the ten digits 0 through 9
pattern = [0-9]
<letter> ::= any one of the 52 alphabetic characters A through Z in upper
case and athrough z in lower case
pattern = [a-z] // note that the capitalization is dropped because we use
the "/i" to make preg_match case insensitive.
<let-dig> ::= <letter> | <digit>
pattern = [a-z0-9]
<let-dig-hyp> ::= <let-dig> | "-"
pattern = [a-z0-9-]
<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
pattern = [a-z0-9-]+
<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]
pattern = [a-z]([a-z0-9-]*[a-z0-9])?
<subdomain> ::= <label> | <subdomain> "." <label>
pattern = ([a-z]([a-z0-9-]*[a-z0-9])?\.?)+
<domain> ::= <subdomain> | " "
pattern = ([a-z]([a-z0-9-]*[a-z0-9])?\.?)+
$regex = "/^([a-z]([a-z0-9-]*[a-z0-9])?\.?)+$/i";
One thing I haven't quite be able to make sense of is the space (" ") for the domain. In this regard, the subdomain and domain patterns are the same. However, after testing, it seems to be working just fine.
Also, if you are wondering what the heck is the syntax used in the specification (like the assignment "::="), you may want to have a look at the Reduced Backus-Naur Form a variant of the Backus-Naur Form.
