[Mailman-Developers] E-mail validation code

Charlie Clark charlie at begeistert.org
Fri Jan 31 11:52:33 EST 2003


Dear list,

I've recently had a whole load of e-mail addresses to check and have been 
looking for a way to help this. I was pointed in the direction of 
Mailman/Utils.py:ValidateEmail by Danny Yoo on the Python tutor list.

It looks like this:
###
_badchars = re.compile('[][()<>|;^,]')

def ValidateEmail(str):
    """Verify that the an email address isn't grossly invalid."""
    # Pretty minimal, cheesy check.  We could do better...
    if not str:
        raise Errors.MMBadEmailError
    if _badchars.search(str) or str[0] == '-':
        raise Errors.MMHostileAddress
    if string.find(str, '/') <> -1 and \
       os.path.isdir(os.path.split(str)[0]):
        # then
        raise Errors.MMHostileAddress
    user, domain_parts = ParseEmail(str)
    # this means local, unqualified addresses, are no allowed
    if not domain_parts:
        raise Errors.MMBadEmailError
    if len(domain_parts) < 2:
        raise Errors.MMBadEmailError
###

My unskilled eye agrees with the comment.

Having trawled the web alternatives I've come across the following 
approaches:

The Python stuff is at
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/66439/index_txt

It is a module with a single class which comes with some built in patterns 
and the ability to make custom patterns. The built in e-mail test doesn't 
catch things like spaces or @@ and other things.

The perl stuff is at
http://aspn.activestate.com/ASPN/Cookbook/Rx/Recipe/68432/index_txt
sub ValidEmailAddr { #check if e-mail address format is valid
  my $mail = shift;                                                  #in 
form name at host
  return 0 if ( $mail !~ /^[0-9a-zA-Z\.\-\_]+\@[0-9a-zA-Z\.\-]+$/ ); 
#characters allowed on name: 0-9a-Z-._ on host: 0-9a-Z-. on between: @
  return 0 if ( $mail =~ /^[^0-9a-zA-Z]|[^0-9a-zA-Z]$/);             #must 
start or end with alpha or num
  return 0 if ( $mail !~ /([0-9a-zA-Z]{1})\@./ );                    #name 
must end with alpha or num
  return 0 if ( $mail !~ /.\@([0-9a-zA-Z]{1})/ );                    #host 
must start with alpha or num
  return 0 if ( $mail =~ /.\.\-.|.\-\..|.\.\..|.\-\-./g );           #pair 
.- or -. or -- or .. not allowed
  return 0 if ( $mail =~ /.\.\_.|.\-\_.|.\_\..|.\_\-.|.\_\_./g );    #pair 
._ or -_ or _. or _- or __ not allowed
  return 0 if ( $mail !~ /\.([a-zA-Z]{2,3})$/ );                     #host 
must end with '.' plus 2 or 3 alpha for TopLevelDomain (MUST be modified in 
future!)
  return 1;
}

This seems to catch pretty much everything but it's perl and I'm not sure 
what !~ and =~ do

I've started work on making custom definitions based on the perl source like
this

sv1 = StringValidator("joe at testmail.com")
sv1.definePattern("test1", "^[0-9a-zA-Z\.\-\_]+\@[0-9a-zA-Z\.\-]+$")
sv1.definePattern("test2", "^[^0-9a-zA-Z]|[^0-9a-zA-Z]$")
if not sv1.isValidForPattern("test1"):
	print sv1.validateString, " has invalid characters in the name"
elif not sv1.isValidForPattern("test1"):
	print sv1.validateString, " doesn't start or end with alpha or num"
else:
	print sv1.validateString, "is valid"

These tests work pretty well but I'm having trouble turning the perl lines 
with =~ into usable Python code: they tend to invalidate real addresses.

And I've been given
http://www.interclasse.com/scripts/EMailValidatorCLS.php

I'll confess to not being much of a programmer but I'm sure I can come up 
with an improvement on the current function with a little help.

Thank you.

Charlie
-- 
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360



More information about the Mailman-Developers mailing list