[Python-checkins] r65700 - in python/trunk: Lib/email/message.py Misc/NEWS

Jack Diederich jackdied at jackdied.com
Sat Aug 16 05:02:09 CEST 2008


On Fri, Aug 15, 2008 at 11:03:21PM +0200, antoine.pitrou wrote:

See my comments on the tracker http://bugs.python.org/issue2676
This patch is suboptimal for bad cases (but still better than the old behavior).
See my patch and tracker comments for why.

-Jack

> Log:
> #2676: email/message.py [Message.get_content_type]: Trivial regex hangs on pathological input
> 
> 
> 
> Modified:
>    python/trunk/Lib/email/message.py
>    python/trunk/Misc/NEWS
> 
> Modified: python/trunk/Lib/email/message.py
> ==============================================================================
> --- python/trunk/Lib/email/message.py	(original)
> +++ python/trunk/Lib/email/message.py	Fri Aug 15 23:03:21 2008
> @@ -19,18 +19,22 @@
>  
>  SEMISPACE = '; '
>  
> -# Regular expression used to split header parameters.  BAW: this may be too
> -# simple.  It isn't strictly RFC 2045 (section 5.1) compliant, but it catches
> -# most headers found in the wild.  We may eventually need a full fledged
> -# parser eventually.
> -paramre = re.compile(r'\s*;\s*')
>  # Regular expression that matches `special' characters in parameters, the
>  # existance of which force quoting of the parameter value.
>  tspecials = re.compile(r'[ \(\)<>@,;:\\"/\[\]\?=]')
>  
>  
> -
>  # Helper functions
> +def _splitparam(param):
> +    # Split header parameters.  BAW: this may be too simple.  It isn't
> +    # strictly RFC 2045 (section 5.1) compliant, but it catches most headers
> +    # found in the wild.  We may eventually need a full fledged parser
> +    # eventually.
> +    a, sep, b = param.partition(';')
> +    if not sep:
> +        return a.strip(), None
> +    return a.strip(), b.strip()
> +
>  def _formatparam(param, value=None, quote=True):
>      """Convenience function to format and return a key=value pair.
>  
> @@ -436,7 +440,7 @@
>          if value is missing:
>              # This should have no parameters
>              return self.get_default_type()
> -        ctype = paramre.split(value)[0].lower().strip()
> +        ctype = _splitparam(value)[0].lower()
>          # RFC 2045, section 5.2 says if its invalid, use text/plain
>          if ctype.count('/') != 1:
>              return 'text/plain'
> 
> Modified: python/trunk/Misc/NEWS
> ==============================================================================
> --- python/trunk/Misc/NEWS	(original)
> +++ python/trunk/Misc/NEWS	Fri Aug 15 23:03:21 2008
> @@ -48,6 +48,10 @@
>  Library
>  -------
>  
> +- Issue #2676: in the email package, content-type parsing was hanging on
> +  pathological input because of quadratic or exponential behaviour of a
> +  regular expression.
> +
>  - Issue #3476: binary buffered reading through the new "io" library is now
>    thread-safe.
>  
> _______________________________________________
> Python-checkins mailing list
> Python-checkins at python.org
> http://mail.python.org/mailman/listinfo/python-checkins
> 


More information about the Python-checkins mailing list