[Python-checkins] python/dist/src/Lib/email Generator.py,1.16,1.17

bwarsaw@users.sourceforge.net bwarsaw@users.sourceforge.net
Mon, 14 Oct 2002 08:09:36 -0700


Update of /cvsroot/python/python/dist/src/Lib/email
In directory usw-pr-cvs1:/tmp/cvs-serv24305/email

Modified Files:
	Generator.py 
Log Message:
_split_header(): If we have a header which is a byte string containing
8-bit data, we cannot split it safely, so return the original string
unchanged.

_is8bitstring(): Helper function which returns True when we have a
byte string that contains non-ascii characters (i.e. mysterious 8-bit
data).


Index: Generator.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/email/Generator.py,v
retrieving revision 1.16
retrieving revision 1.17
diff -C2 -d -r1.16 -r1.17
*** Generator.py	28 Sep 2002 18:04:55 -0000	1.16
--- Generator.py	14 Oct 2002 15:09:30 -0000	1.17
***************
*** 9,13 ****
  import random
  
! from types import ListType
  from cStringIO import StringIO
  
--- 9,13 ----
  import random
  
! from types import ListType, StringType
  from cStringIO import StringIO
  
***************
*** 36,39 ****
--- 36,47 ----
  fcre = re.compile(r'^From ', re.MULTILINE)
  
+ def _is8bitstring(s):
+     if isinstance(s, StringType):
+         try:
+             unicode(s, 'us-ascii')
+         except UnicodeError:
+             return True
+     return False
+ 
  
  
***************
*** 174,177 ****
--- 182,193 ----
              # No line was actually longer than maxheaderlen characters, so
              # just return the original unchanged.
+             return text
+         # If we have raw 8bit data in a byte string, we have no idea what the
+         # encoding is.  I think there is no safe way to split this string.  If
+         # it's ascii-subset, then we could do a normal ascii split, but if
+         # it's multibyte then we could break the string.  There's no way to
+         # know so the least harm seems to be to not split the string and risk
+         # it being too long.
+         if _is8bitstring(text):
              return text
          # The `text' argument already has the field name prepended, so don't