A + in an e-mail is valid. Can't we stop using validators that don't follow the standard?
This is something that has long irritated. Websites that use form validation or similar that rejects e-mail with a + sign in the left part.
Background
Why would you want to? Well - quite a lot of e-mail servers (including gmail - which we'll use for examples here) take:
All of these will be delivered to [email protected]. This allows you to give unique e-mails to each site out there while still only needing the one account - makes filtering in gmail easier, and allows you to track who's selling what to others too ;)
Standards
What says that the + is valid?
E-mail addresses are regulated by a set of RFC documents.
The original, RFC822, this was replaced by RFC2822 and finally this was updated for internationalization in RFC5335.
In 2822 - the relevant section is 3.4.1.
An addr-spec is a specific Internet identifier that contains a locally interpreted string followed by the at-sign character ("@", ASCII value 64) followed by an Internet domain. The locally interpreted string is either a quoted-string or a dot-atom. If the string can be represented as a dot-atom (that is, it contains no characters other than atext characters or "." surrounded by atext characters), then the dot-atom form SHOULD be used and the quoted-string form SHOULD NOT be used. Comments and folding white space SHOULD NOT be used around the "@" in the addr-spec.
What this says is that the valid chars for the local-part (left of the @ sign) are made up of one of the following:
- dot-atom
- quoted-string
- obs-local-part
If you dig through - dot-atom is made up from dot-atom-text - which in turn is made up of atext.
atext = ALPHA / DIGIT / ; Any character except controls, "!" / "#" / ; SP, and specials. "$" / "%" / ; Used for atoms "&" / "'" / "*" / "+" / "-" / "/" / "=" / "?" / "^" / "_" / "`" / "{" / "|" / "}" / "~"
All of these are valid.
822 used a slightly different layout - local-part is made up of atom or quoted string, atom is one or more "any CHAR except specials, SPACE and CTLs" - and the specials do not include +.
5332 replaces atext with utf8-atext - guess what - the + is still there ;)
Issues
There seem to be a lot of broken validators out there that have an incorrect set of characters. The one that gives the most issues seems to be the + sign. Not sure why - but it seems quite a common issue.
Even worse is when it's allowed on the registration page but rejected by the login page (where e-mail is used at login).
A smaller issue are sites that fail to handle the encoding of this when displaying it (a + urldecodes to a space - so you have to encode it first). This is just broken webpage design - but when the confirmation page displays the address as "foo [email protected]" instead of "[email protected]" and the form wizard fails to complete with an email validation error on the confirmation page (usually without the ability to edit the field) - that's just as annoying.
Summary
If you're going to validate e-mail addresses (a good idea) then please use a validation library that is correct. Don't assume that you know what an e-mail looks like - the valid combinations are wildly more varied than most people realize. Don't assume a simple regular expression is good enough (see this regex).
Oh - and asking your users to change their e-mail due to your broken validation is not something that makes them feel good towards your brand!