[Python-checkins] r57344 - in sandbox/trunk/emailpkg/5_0-exp/email: charset.py header.py message.py quoprimime.py test/test_email.py

barry.warsaw python-checkins at python.org
Thu Aug 23 22:37:37 CEST 2007


Author: barry.warsaw
Date: Thu Aug 23 22:37:37 2007
New Revision: 57344

Modified:
   sandbox/trunk/emailpkg/5_0-exp/email/charset.py
   sandbox/trunk/emailpkg/5_0-exp/email/header.py
   sandbox/trunk/emailpkg/5_0-exp/email/message.py
   sandbox/trunk/emailpkg/5_0-exp/email/quoprimime.py
   sandbox/trunk/emailpkg/5_0-exp/email/test/test_email.py
Log:
Progress on the email package for 3.0.  Even though the failure rate is up, I
believe this is farther along the path of 'being right' than before.

The big changes are related to header splitting, folding and quopri encoding:

 - Continuation whitespace is a hint only now.  Header.encode() now preserves
   existing whitespace when doing RFC 2822 folding as much as possible.  This
   improves idempotency and RFC compliance.  This means that tests which split
   on whitespace now preserve their whitespace instead of forcing everything
   to tabs.
 - When folding occurs on other than whitespace (e.g. ';'), the line splitting
   happens after the split character as per the RFC.
 - Header.append() accepts bytes or stringsy.  When a string is given, it is
   first converted to bytes using the charset's input codec.  Then in both
   cases, the bytes are converted back to a string using the charset's output
   codec.  If any of those steps fail, we let the resulting exception
   percolate up.y  Internally, we store the chunks as strings and charsets
   because this is the best way to split and fold header values for output.
 - charset.header_encode() has been greatly simplified.  This no longer
   accepts a 'convert' flag.  It requires a string, which it first converts to
   bytes and then encodes to bytes, then encodes to q or b as appropriate.
 - Similarly quoprimime.header_encode() no longer accepts a keep_eols,
   maxlinelen, or eol arguments and it no longer does anything except encode
   what you gave it using the header quopri algorithm.
 - quoprimime.header_encode() and .header_quopri_len() both take byte arrays
   only now instead of strings.
 - The method of splitting and fitting header values into maxlinelen character
   has been completely rewritten.  It may not be perfect or efficiently, but
   it should be more correctly conformant to the RFCs and works better with
   strictly unicode chunks.  Also, any adjacent chunks of the same charset are
   first collapsed before being split so you may see different, though more
   efficient, encodings.
 - Fixed the broken holdover of __str__() and __unicode__() doing two
   different things.  __str__() is no longer a synonym for .encode() -- use
   that directly.  __unicode__() is renamed __str__().  The difference is that
   .encode() prints the RFC 2047 encoded header value while str() prints the
   unicode concatenation of the chunks.

Other changes include:
 - using 'in' instead of .has_key()
 - Fixed the __all__ test to not expect the old module names.
 - Get rid of the bogus 8bit charset.
 - Added an __iter__() to email.message.Message for iterating over the
   message's headers.  Also added an __len__().


Modified: sandbox/trunk/emailpkg/5_0-exp/email/charset.py
==============================================================================
--- sandbox/trunk/emailpkg/5_0-exp/email/charset.py	(original)
+++ sandbox/trunk/emailpkg/5_0-exp/email/charset.py	Thu Aug 23 22:37:37 2007
@@ -57,8 +57,6 @@
     'iso-2022-jp': (BASE64,    None,    None),
     'koi8-r':      (BASE64,    BASE64,  None),
     'utf-8':       (SHORTEST,  BASE64, 'utf-8'),
-    # We're making this one up to represent raw unencoded 8-bit
-    '8bit':        (None,      BASE64, 'utf-8'),
     }
 
 # Aliases for other commonly-used names for character sets.  Map
@@ -341,36 +339,33 @@
         else:
             return len(s)
 
-    def header_encode(self, s, convert=False):
-        """Header-encode a string, optionally converting it to output_charset.
+    def header_encode(self, string):
+        """Header-encode a string by converting it first to bytes.
 
-        If convert is True, the string will be converted from the input
-        charset to the output charset automatically.  This is not useful for
-        multibyte character sets, which have line length issues (multibyte
-        characters must be split on a character, not a byte boundary); use the
-        high-level Header class to deal with these issues.  convert defaults
-        to False.
+        :param string: A unicode string for the header.  This must be
+        encodable to bytes using the current character set's `output_codec`.
 
         The type of encoding (base64 or quoted-printable) will be based on
-        self.header_encoding.
+        this charset's `header_encoding`.
         """
-        cset = self.get_output_charset()
-        if convert:
-            s = self.convert(s)
+        codec = self.output_codec or 'us-ascii'
+        charset = self.get_output_charset()
+        header_bytes = string.encode(codec)
         # 7bit/8bit encodings return the string unchanged (modulo conversions)
         if self.header_encoding == BASE64:
-            return email.base64mime.header_encode(s, cset)
+            encoder = email.base64mime.header_encode
         elif self.header_encoding == QP:
-            return email.quoprimime.header_encode(s, cset, maxlinelen=None)
+            encoder = email.quoprimime.header_encode
         elif self.header_encoding == SHORTEST:
-            lenb64 = email.base64mime.base64_len(s)
-            lenqp = email.quoprimime.header_quopri_len(s)
+            lenb64 = email.base64mime.base64_len(header_bytes)
+            lenqp = email.quoprimime.header_quopri_len(header_bytes)
             if lenb64 < lenqp:
-                return email.base64mime.header_encode(s, cset)
+                encoder = email.base64mime.header_encode
             else:
-                return email.quoprimime.header_encode(s, cset, maxlinelen=None)
+                encoder = email.quoprimime.header_encode
         else:
-            return s
+            return string
+        return encoder(header_bytes, codec)
 
     def body_encode(self, s, convert=True):
         """Body-encode a string and convert it to output_charset.

Modified: sandbox/trunk/emailpkg/5_0-exp/email/header.py
==============================================================================
--- sandbox/trunk/emailpkg/5_0-exp/email/header.py	(original)
+++ sandbox/trunk/emailpkg/5_0-exp/email/header.py	Thu Aug 23 22:37:37 2007
@@ -180,35 +180,24 @@
         """
         if charset is None:
             charset = USASCII
-        if not isinstance(charset, Charset):
+        elif not isinstance(charset, Charset):
             charset = Charset(charset)
         self._charset = charset
         self._continuation_ws = continuation_ws
-        cws_expanded_len = len(continuation_ws.replace('\t', SPACE8))
-        # BAW: I believe `chunks' and `maxlinelen' should be non-public.
         self._chunks = []
         if s is not None:
             self.append(s, charset, errors)
         if maxlinelen is None:
             maxlinelen = MAXLINELEN
+        self._maxlinelen = maxlinelen
         if header_name is None:
-            # We don't know anything about the field header so the first line
-            # is the same length as subsequent lines.
-            self._firstlinelen = maxlinelen
+            self._headerlen = 0
         else:
-            # The first line should be shorter to take into account the field
-            # header.  Also subtract off 2 extra for the colon and space.
-            self._firstlinelen = maxlinelen - len(header_name) - 2
-        # Second and subsequent lines should subtract off the length in
-        # columns of the continuation whitespace prefix.
-        self._maxlinelen = maxlinelen - cws_expanded_len
-
-##     def __str__(self):
-##         """A synonym for self.encode()."""
-##         return self.encode()
+            # Take the separating colon and space into account.
+            self._headerlen = len(header_name) + 2
 
     def __str__(self):
-        """Helper for the built-in unicode function."""
+        """Return the string value of the header."""
         uchunks = []
         lastcs = None
         for s, charset in self._chunks:
@@ -225,7 +214,7 @@
                 elif nextcs not in (None, 'us-ascii'):
                     uchunks.append(SPACE)
             lastcs = nextcs
-            uchunks.append(str(s, str(charset)))
+            uchunks.append(s)
         return EMPTYSTRING.join(uchunks)
 
     # Rich comparison operators for equality only.  BAW: does it make sense to
@@ -263,123 +252,22 @@
             charset = self._charset
         elif not isinstance(charset, Charset):
             charset = Charset(charset)
-        # If the charset is our faux 8bit charset, leave the string unchanged
-        if charset != '8bit':
-            # We need to test that the string can be converted to unicode and
-            # back to a byte string, given the input and output codecs of the
-            # charset.
-            if isinstance(s, bytes):
-                # Possibly raise UnicodeError if the byte string can't be
-                # converted to a unicode with the input codec of the charset.
-                incodec = charset.input_codec or 'us-ascii'
-                ustr = str(s, incodec, errors)
-                # Now make sure that the unicode could be converted back to a
-                # byte string with the output codec, which may be different
-                # than the iput coded.  Still, use the original byte string.
-                outcodec = charset.output_codec or 'us-ascii'
-                ustr.encode(outcodec, errors)
-            elif isinstance(s, bytes):
-                # Now we have to be sure the unicode string can be converted
-                # to a byte string with a reasonable output codec.  We want to
-                # use the byte string in the chunk.
-                for charset in USASCII, charset, UTF8:
-                    try:
-                        outcodec = charset.output_codec or 'us-ascii'
-                        s = s.encode(outcodec, errors)
-                        break
-                    except UnicodeError:
-                        pass
-                else:
-                    assert False, 'utf-8 conversion failed'
-        self._chunks.append((s, charset))
-
-    def _split(self, s, charset, maxlinelen, splitchars):
-        # Split up a header safely for use with encode_chunks.
-        splittable = charset.to_splittable(s)
-        encoded = charset.from_splittable(splittable, True)
-        elen = charset.encoded_header_len(encoded)
-        # If the line's encoded length first, just return it
-        if elen <= maxlinelen:
-            return [(encoded, charset)]
-        # If we have undetermined raw 8bit characters sitting in a byte
-        # string, we really don't know what the right thing to do is.  We
-        # can't really split it because it might be multibyte data which we
-        # could break if we split it between pairs.  The least harm seems to
-        # be to not split the header at all, but that means they could go out
-        # longer than maxlinelen.
-        if charset == '8bit':
-            return [(s, charset)]
-        # BAW: I'm not sure what the right test here is.  What we're trying to
-        # do is be faithful to RFC 2822's recommendation that ($2.2.3):
-        #
-        # "Note: Though structured field bodies are defined in such a way that
-        #  folding can take place between many of the lexical tokens (and even
-        #  within some of the lexical tokens), folding SHOULD be limited to
-        #  placing the CRLF at higher-level syntactic breaks."
-        #
-        # For now, I can only imagine doing this when the charset is us-ascii,
-        # although it's possible that other charsets may also benefit from the
-        # higher-level syntactic breaks.
-        elif charset == 'us-ascii':
-            return self._split_ascii(s, charset, maxlinelen, splitchars)
-        # BAW: should we use encoded?
-        elif elen == len(s):
-            # We can split on _maxlinelen boundaries because we know that the
-            # encoding won't change the size of the string
-            splitpnt = maxlinelen
-            first = charset.from_splittable(splittable[:splitpnt], False)
-            last = charset.from_splittable(splittable[splitpnt:], False)
+        if isinstance(s, str):
+            # Convert the string from the input character set to the output
+            # character set and store the resulting bytes and the charset for
+            # composition later.
+            input_charset = charset.input_codec or 'us-ascii'
+            input_bytes = s.encode(input_charset, errors)
         else:
-            # Binary search for split point
-            first, last = _binsplit(splittable, charset, maxlinelen)
-        # first is of the proper length so just wrap it in the appropriate
-        # chrome.  last must be recursively split.
-        fsplittable = charset.to_splittable(first)
-        fencoded = charset.from_splittable(fsplittable, True)
-        chunk = [(fencoded, charset)]
-        return chunk + self._split(last, charset, self._maxlinelen, splitchars)
-
-    def _split_ascii(self, s, charset, firstlen, splitchars):
-        chunks = _split_ascii(s, firstlen, self._maxlinelen,
-                              self._continuation_ws, splitchars)
-        return zip(chunks, [charset]*len(chunks))
-
-    def _encode_chunks(self, newchunks, maxlinelen):
-        # MIME-encode a header with many different charsets and/or encodings.
-        #
-        # Given a list of pairs (string, charset), return a MIME-encoded
-        # string suitable for use in a header field.  Each pair may have
-        # different charsets and/or encodings, and the resulting header will
-        # accurately reflect each setting.
-        #
-        # Each encoding can be email.Utils.QP (quoted-printable, for
-        # ASCII-like character sets like iso-8859-1), email.Utils.BASE64
-        # (Base64, for non-ASCII like character sets like KOI8-R and
-        # iso-2022-jp), or None (no encoding).
-        #
-        # Each pair will be represented on a separate line; the resulting
-        # string will be in the format:
-        #
-        # =?charset1?q?Mar=EDa_Gonz=E1lez_Alonso?=\n
-        #  =?charset2?b?SvxyZ2VuIEL2aW5n?="
-        chunks = []
-        for header, charset in newchunks:
-            if not header:
-                continue
-            if charset is None or charset.header_encoding is None:
-                s = header
-            else:
-                s = charset.header_encode(header)
-            # Don't add more folding whitespace than necessary
-            if chunks and chunks[-1].endswith(' '):
-                extra = ''
-            else:
-                extra = ' '
-            _max_append(chunks, s, maxlinelen, extra)
-        joiner = NL + self._continuation_ws
-        return joiner.join(chunks)
+            # We already have the bytes we will store internally.
+            input_bytes = s
+        # Ensure that the bytes we're storing can be decoded to the output
+        # character set, otherwise an early error is thrown.
+        output_charset = charset.output_codec or 'us-ascii'
+        output_string = input_bytes.decode(output_charset, errors)
+        self._chunks.append((output_string, charset))
 
-    def encode(self, splitchars=';, '):
+    def encode(self, splitchars=';, \t'):
         """Encode a message header into an RFC-compliant format.
 
         There are many issues involved in converting a given string for use in
@@ -401,118 +289,232 @@
         ASCII lines on, in rough support of RFC 2822's `highest level
         syntactic breaks'.  This doesn't affect RFC 2047 encoded lines.
         """
-        newchunks = []
-        maxlinelen = self._firstlinelen
-        lastlen = 0
-        for s, charset in self._chunks:
-            # The first bit of the next chunk should be just long enough to
-            # fill the next line.  Don't forget the space separating the
-            # encoded words.
-            targetlen = maxlinelen - lastlen - 1
-            if targetlen < charset.encoded_header_len(''):
-                # Stick it on the next line
-                targetlen = maxlinelen
-            newchunks += self._split(s, charset, targetlen, splitchars)
-            lastchunk, lastcharset = newchunks[-1]
-            lastlen = lastcharset.encoded_header_len(lastchunk)
-        return self._encode_chunks(newchunks, maxlinelen)
+        self._normalize()
+        formatter = _ValueFormatter(self._headerlen, self._maxlinelen,
+                                    self._continuation_ws, splitchars)
+        for string, charset in self._chunks:
+            lines = string.splitlines()
+            for line in lines:
+                formatter.feed(line, charset)
+                if len(lines) > 1:
+                    formatter.newline()
+        return str(formatter)
+
+    def _normalize(self):
+        # Normalize the chunks so that all runs of identical charsets get
+        # collapsed into a single unicode string.  You need a space between
+        # encoded words, or between encoded and unencoded words.
+        chunks = []
+        last_charset = None
+        last_chunk = []
+        for string, charset in self._chunks:
+            if charset == last_charset:
+                last_chunk.append(string)
+            else:
+                if last_charset is not None:
+                    chunks.append((SPACE.join(last_chunk), last_charset))
+                    if last_charset != USASCII or charset != USASCII:
+                        chunks.append((' ', USASCII))
+                last_chunk = [string]
+                last_charset = charset
+        if last_chunk:
+            chunks.append((SPACE.join(last_chunk), last_charset))
+        self._chunks = chunks
 
 
 
-def _split_ascii(s, firstlen, restlen, continuation_ws, splitchars):
-    lines = []
-    maxlen = firstlen
-    for line in s.splitlines():
-        # Ignore any leading whitespace (i.e. continuation whitespace) already
-        # on the line, since we'll be adding our own.
-        line = line.lstrip()
-        if len(line) < maxlen:
-            lines.append(line)
-            maxlen = restlen
-            continue
+class _ValueFormatter:
+    def __init__(self, headerlen, maxlen, continuation_ws, splitchars):
+        self._maxlen = maxlen
+        self._continuation_ws = continuation_ws
+        self._continuation_ws_len = len(continuation_ws.replace('\t', SPACE8))
+        self._splitchars = splitchars
+        self._lines = []
+        self._current_line = _Accumulator(headerlen)
+
+    def __str__(self):
+        self.newline()
+        return NL.join(self._lines)
+
+    def newline(self):
+        if len(self._current_line) > 0:
+            self._lines.append(str(self._current_line))
+        self._current_line.reset()
+
+    def feed(self, string, charset):
+        # If the string itself fits on the current line in its encoded format,
+        # then add it now and be done with it.
+        encoded_string = charset.header_encode(string)
+        if len(encoded_string) + len(self._current_line) <= self._maxlen:
+            self._current_line.push(encoded_string)
+            return
         # Attempt to split the line at the highest-level syntactic break
         # possible.  Note that we don't have a lot of smarts about field
         # syntax; we just try to break on semi-colons, then commas, then
-        # whitespace.
-        for ch in splitchars:
-            if ch in line:
+        # whitespace.  Eventually, we'll allow this to be pluggable.
+        for ch in self._splitchars:
+            if ch in string:
                 break
         else:
-            # There's nothing useful to split the line on, not even spaces, so
-            # just append this line unchanged
-            lines.append(line)
-            maxlen = restlen
-            continue
-        # Now split the line on the character plus trailing whitespace
-        cre = re.compile(r'%s\s*' % ch)
-        if ch in ';,':
-            eol = ch
-        else:
-            eol = ''
-        joiner = eol + ' '
-        joinlen = len(joiner)
-        wslen = len(continuation_ws.replace('\t', SPACE8))
-        this = []
-        linelen = 0
-        for part in cre.split(line):
-            curlen = linelen + max(0, len(this)-1) * joinlen
-            partlen = len(part)
-            onfirstline = not lines
-            # We don't want to split after the field name, if we're on the
-            # first line and the field name is present in the header string.
-            if ch == ' ' and onfirstline and \
-                   len(this) == 1 and fcre.match(this[0]):
-                this.append(part)
-                linelen += partlen
-            elif curlen + partlen > maxlen:
-                if this:
-                    lines.append(joiner.join(this) + eol)
-                # If this part is longer than maxlen and we aren't already
-                # splitting on whitespace, try to recursively split this line
-                # on whitespace.
-                if partlen > maxlen and ch != ' ':
-                    subl = _split_ascii(part, maxlen, restlen,
-                                        continuation_ws, ' ')
-                    lines.extend(subl[:-1])
-                    this = [subl[-1]]
+            # We can't split the string to fit on the current line, so just
+            # put it on a line by itself.
+            self._lines.append(str(self._current_line))
+            self._current_line.reset(self._continuation_ws)
+            self._current_line.push(encoded_string)
+            return
+        self._spliterate(string, ch, charset)
+
+    def _spliterate(self, string, ch, charset):
+        holding = _Accumulator(transformfunc=charset.header_encode)
+        # Split the line on the split character, preserving it.  If the split
+        # character is whitespace RFC 2822 $2.2.3 requires us to fold on the
+        # whitespace, so that the line leads with the original whitespace we
+        # split on.  However, if a higher syntactic break is used instead
+        # (e.g. comma or semicolon), the folding should happen after the split
+        # character.  But then in that case, we need to add our own
+        # continuation whitespace -- although won't that break unfolding?
+        for part, splitpart, nextpart in _spliterator(ch, string):
+            if not splitpart:
+                # No splitpart means this is the last chunk.  Put this part
+                # either on the current line or the next line depending on
+                # whether it fits.
+                holding.push(part)
+                if len(holding) + len(self._current_line) <= self._maxlen:
+                    # It fits, but we're done.
+                    self._current_line.push(str(holding))
+                else:
+                    # It doesn't fit, but we're done.  Before pushing a new
+                    # line, watch out for the current line containing only
+                    # whitespace.
+                    holding.pop()
+                    if len(self._current_line) == 0 and (
+                        len(holding) == 0 or str(holding).isspace()):
+                        # Don't start a new line.
+                        holding.push(part)
+                        part = None
+                    self._current_line.push(str(holding))
+                    self._lines.append(str(self._current_line))
+                    if part is None:
+                        self._current_line.reset()
+                    else:
+                        holding.reset(part)
+                        self._current_line.reset(str(holding))
+                return
+            elif not nextpart:
+                # There must be some trailing split characters because we
+                # found a split character but no next part.  In this case we
+                # must treat the thing to fit as the part + splitpart because
+                # if splitpart is whitespace it's not allowed to be the only
+                # thing on the line, and if it's not whitespace we must split
+                # after the syntactic break.  In either case, we're done.
+                holding_prelen = len(holding)
+                holding.push(part + splitpart)
+                if len(holding) + len(self._current_line) <= self._maxlen:
+                    self._current_line.push(str(holding))
+                elif holding_prelen == 0:
+                    # This is the only chunk left so it has to go on the
+                    # current line.
+                    self._current_line.push(str(holding))
                 else:
-                    this = [part]
-                linelen = wslen + len(this[-1])
-                maxlen = restlen
+                    save_part = holding.pop()
+                    self._current_line.push(str(holding))
+                    self._lines.append(str(self._current_line))
+                    holding.reset(save_part)
+                    self._current_line.reset(str(holding))
+                return
+            elif not part:
+                # We're leading with a split character.  See if the splitpart
+                # and nextpart fits on the current line.
+                holding.push(splitpart + nextpart)
+                holding_len = len(holding)
+                # We know we're not leaving the nextpart on the stack.
+                holding.pop()
+                if holding_len + len(self._current_line) <= self._maxlen:
+                    holding.push(splitpart)
+                else:
+                    # It doesn't fit.  Since there's no current part really
+                    # the best we can do is start a new line and push the
+                    # split part onto it.
+                    self._current_line.push(str(holding))
+                    holding.reset()
+                    if len(self._current_line) > 0 and self._lines:
+                        self._lines.append(str(self._current_line))
+                        self._current_line.reset()
+                    holding.push(splitpart)
             else:
-                this.append(part)
-                linelen += partlen
-        # Put any left over parts on a line by themselves
-        if this:
-            lines.append(joiner.join(this))
-    return lines
+                # All three parts are present.  First let's see if all three
+                # parts will fit on the current line.  If so, we don't need to
+                # split it.
+                holding.push(part + splitpart + nextpart)
+                holding_len = len(holding)
+                # Pop the part because we'll push nextpart on the next
+                # iteration through the loop.
+                holding.pop()
+                if holding_len + len(self._current_line) <= self._maxlen:
+                    holding.push(part + splitpart)
+                else:
+                    # The entire thing doesn't fit.  See if we need to split
+                    # before or after the split characters.
+                    if splitpart.isspace():
+                        # Split before whitespace.  Remember that the
+                        # whitespace becomes the continuation whitespace of
+                        # the next line so it goes to current_line not holding.
+                        holding.push(part)
+                        self._current_line.push(str(holding))
+                        holding.reset()
+                        self._lines.append(str(self._current_line))
+                        self._current_line.reset(splitpart)
+                    else:
+                        # Split after non-whitespace.  The continuation
+                        # whitespace comes from the instance variable.
+                        holding.push(part + splitpart)
+                        self._current_line.push(str(holding))
+                        holding.reset()
+                        self._lines.append(str(self._current_line))
+                        if nextpart[0].isspace():
+                            self._current_line.reset()
+                        else:
+                            self._current_line.reset(self._continuation_ws)
+        # Get the last of the holding part
+        self._current_line.push(str(holding))
 
 
 
-def _binsplit(splittable, charset, maxlinelen):
-    i = 0
-    j = len(splittable)
-    while i < j:
-        # Invariants:
-        # 1. splittable[:k] fits for all k <= i (note that we *assume*,
-        #    at the start, that splittable[:0] fits).
-        # 2. splittable[:k] does not fit for any k > j (at the start,
-        #    this means we shouldn't look at any k > len(splittable)).
-        # 3. We don't know about splittable[:k] for k in i+1..j.
-        # 4. We want to set i to the largest k that fits, with i <= k <= j.
-        #
-        m = (i+j+1) >> 1  # ceiling((i+j)/2); i < m <= j
-        chunk = charset.from_splittable(splittable[:m], True)
-        chunklen = charset.encoded_header_len(chunk)
-        if chunklen <= maxlinelen:
-            # m is acceptable, so is a new lower bound.
-            i = m
+def _spliterator(character, string):
+    parts = list(reversed(re.split('(%s)' % character, string)))
+    while parts:
+        part = parts.pop()
+        splitparts = (parts.pop() if parts else None)
+        nextpart = (parts.pop() if parts else None)
+        yield (part, splitparts, nextpart)
+        if nextpart is not None:
+            parts.append(nextpart)
+
+
+class _Accumulator:
+    def __init__(self, initial_size=0, transformfunc=None):
+        self._initial_size = initial_size
+        if transformfunc is None:
+            self._transformfunc = lambda string: string
         else:
-            # m is not acceptable, so final i must be < m.
-            j = m - 1
-    # i == j.  Invariant #1 implies that splittable[:i] fits, and
-    # invariant #2 implies that splittable[:i+1] does not fit, so i
-    # is what we're looking for.
-    first = charset.from_splittable(splittable[:i], False)
-    last  = charset.from_splittable(splittable[i:], False)
-    return first, last
+            self._transformfunc = transformfunc
+        self._current = []
+
+    def push(self, string):
+        self._current.append(string)
+
+    def pop(self):
+        return self._current.pop()
+
+    def __len__(self):
+        return len(str(self)) + self._initial_size
+
+    def __str__(self):
+        return self._transformfunc(EMPTYSTRING.join(self._current))
+
+    def reset(self, string=None):
+        self._current = []
+        self._current_len = 0
+        self._initial_size = 0
+        if string is not None:
+            self.push(string)

Modified: sandbox/trunk/emailpkg/5_0-exp/email/message.py
==============================================================================
--- sandbox/trunk/emailpkg/5_0-exp/email/message.py	(original)
+++ sandbox/trunk/emailpkg/5_0-exp/email/message.py	Thu Aug 23 22:37:37 2007
@@ -125,7 +125,7 @@
         "From ".  For more flexibility, use the flatten() method of a
         Generator instance.
         """
-        from email.Generator import Generator
+        from email.generator import Generator
         fp = StringIO()
         g = Generator(fp, mangle_from_=False, maxheaderlen=maxheaderlen)
         g.flatten(self, unixfrom=unixfrom)
@@ -312,10 +312,12 @@
     def __contains__(self, name):
         return name.lower() in [k.lower() for k, v in self._headers]
 
-    def has_key(self, name):
-        """Return true if the message contains the header."""
-        missing = object()
-        return self.get(name, missing) is not missing
+    def __iter__(self):
+        for field, value in self._headers:
+            yield field
+
+    def __len__(self):
+        return len(self._headers)
 
     def keys(self):
         """Return a list of all the message's header field names.
@@ -785,4 +787,4 @@
         return [part.get_content_charset(failobj) for part in self.walk()]
 
     # I.e. def walk(self): ...
-    from email.Iterators import walk
+    from email.iterators import walk

Modified: sandbox/trunk/emailpkg/5_0-exp/email/quoprimime.py
==============================================================================
--- sandbox/trunk/emailpkg/5_0-exp/email/quoprimime.py	(original)
+++ sandbox/trunk/emailpkg/5_0-exp/email/quoprimime.py	Thu Aug 23 22:37:37 2007
@@ -45,54 +45,57 @@
 
 import re
 
-from string import hexdigits
+from string import ascii_letters, digits, hexdigits
 from email.utils import fix_eols
 
 CRLF = '\r\n'
 NL = '\n'
+EMPTYSTRING = ''
 
 # See also Charset.py
 MISC_LEN = 7
 
-hqre = re.compile(r'[^-a-zA-Z0-9!*+/ ]')
-bqre = re.compile(r'[^ !-<>-~\t]')
+HEADER_SAFE_BYTES = b'-!*+/ ' + bytes(ascii_letters) + bytes(digits)
+BODY_SAFE_BYTES   = (b' !"#$%&\'()*+,-./0123456789:;<>'
+                     b'?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`'
+                     b'abcdefghijklmnopqrstuvwxyz{|}~\t')
 
 
 
 # Helpers
 def header_quopri_check(c):
     """Return True if the character should be escaped with header quopri."""
-    return bool(hqre.match(c))
+    return c not in HEADER_SAFE_BYTES
 
 
 def body_quopri_check(c):
     """Return True if the character should be escaped with body quopri."""
-    return bool(bqre.match(c))
+    return c not in BODY_SAFE_BYTES
 
 
-def header_quopri_len(s):
-    """Return the length of str when it is encoded with header quopri."""
+def header_quopri_len(bytearray):
+    """Return the length of bytearray when it is encoded with header quopri.
+
+    Note that this does not include any RFC 2047 chrome added by
+    `header_encode()`.
+    """
     count = 0
-    for c in s:
-        if hqre.match(c):
-            count += 3
-        else:
-            count += 1
+    for c in bytearray:
+        count += (3 if header_quopri_check(c) else 1)
     return count
 
 
-def body_quopri_len(str):
-    """Return the length of str when it is encoded with body quopri."""
+def body_quopri_len(bytearray):
+    """Return the length of bytearray when it is encoded with body quopri."""
     count = 0
-    for c in str:
-        if bqre.match(c):
-            count += 3
-        else:
-            count += 1
+    for c in bytearray:
+        count += (3 if body_quopri_check(c) else 1)
     return count
 
 
 def _max_append(L, s, maxlen, extra=''):
+    if not isinstance(s, str):
+        s = chr(s)
     if not L:
         L.append(s.lstrip())
     elif len(L[-1]) + len(s) <= maxlen:
@@ -107,12 +110,11 @@
 
 
 def quote(c):
-    return "=%02X" % ord(c)
+    return '=%02X' % ord(c)
 
 
 
-def header_encode(header, charset="iso-8859-1", keep_eols=False,
-                  maxlinelen=76, eol=NL):
+def header_encode(header_bytes, charset='iso-8859-1'):
     """Encode a single header line with quoted-printable (like) encoding.
 
     Defined in RFC 2045, this `Q' encoding is similar to quoted-printable, but
@@ -120,58 +122,27 @@
     bit characters (and some 8 bit) to remain more or less readable in non-RFC
     2045 aware mail clients.
 
-    charset names the character set to use to encode the header.  It defaults
-    to iso-8859-1.
-
-    The resulting string will be in the form:
-
-    "=?charset?q?I_f=E2rt_in_your_g=E8n=E8ral_dire=E7tion?\\n
-      =?charset?q?Silly_=C8nglish_Kn=EEghts?="
-
-    with each line wrapped safely at, at most, maxlinelen characters (defaults
-    to 76 characters).  If maxlinelen is None, the entire string is encoded in
-    one chunk with no splitting.
-
-    End-of-line characters (\\r, \\n, \\r\\n) will be automatically converted
-    to the canonical email line separator \\r\\n unless the keep_eols
-    parameter is True (the default is False).
-
-    Each line of the header will be terminated in the value of eol, which
-    defaults to "\\n".  Set this to "\\r\\n" if you are using the result of
-    this function directly in email.
+    charset names the character set to use in the RFC 2046 header.  It
+    defaults to iso-8859-1.
     """
     # Return empty headers unchanged
-    if not header:
-        return header
-
-    if not keep_eols:
-        header = fix_eols(header)
-
-    # Quopri encode each line, in encoded chunks no greater than maxlinelen in
-    # length, after the RFC chrome is added in.
-    quoted = []
-    if maxlinelen is None:
-        # An obnoxiously large number that's good enough
-        max_encoded = 100000
-    else:
-        max_encoded = maxlinelen - len(charset) - MISC_LEN - 1
-
-    for c in header:
+    if not header_bytes:
+        return str(header_bytes)
+    # Iterate over every byte, encoding if necessary.
+    encoded = []
+    for character in header_bytes:
         # Space may be represented as _ instead of =20 for readability
-        if c == ' ':
-            _max_append(quoted, '_', max_encoded)
-        # These characters can be included verbatim
-        elif not hqre.match(c):
-            _max_append(quoted, c, max_encoded)
+        if character == ord(' '):
+            encoded.append('_')
+        # These characters can be included verbatim.
+        elif not header_quopri_check(character):
+            encoded.append(chr(character))
         # Otherwise, replace with hex value like =E2
         else:
-            _max_append(quoted, "=%02X" % ord(c), max_encoded)
-
+            encoded.append('=%02X' % character)
     # Now add the RFC chrome to each encoded chunk and glue the chunks
-    # together.  BAW: should we be able to specify the leading whitespace in
-    # the joiner?
-    joiner = eol + ' '
-    return joiner.join(['=?%s?q?%s?=' % (charset, line) for line in quoted])
+    # together.
+    return '=?%s?q?%s?=' % (charset, EMPTYSTRING.join(encoded))
 
 
 
@@ -221,7 +192,7 @@
         for j in range(linelen):
             c = line[j]
             prev = c
-            if bqre.match(c):
+            if body_quopri_check(c):
                 c = quote(c)
             elif j+1 == linelen:
                 # Check for whitespace at end of line; special case

Modified: sandbox/trunk/emailpkg/5_0-exp/email/test/test_email.py
==============================================================================
--- sandbox/trunk/emailpkg/5_0-exp/email/test/test_email.py	(original)
+++ sandbox/trunk/emailpkg/5_0-exp/email/test/test_email.py	Thu Aug 23 22:37:37 2007
@@ -9,7 +9,9 @@
 import difflib
 import unittest
 import warnings
+
 from io import StringIO
+from itertools import chain
 
 import email
 
@@ -304,12 +306,13 @@
         self.assertEqual(msg.get_param('name', unquote=False),
                          '"Jim&amp;&amp;Jill"')
 
-    def test_has_key(self):
+    def test_field_containment(self):
+        unless = self.failUnless
         msg = email.message_from_string('Header: exists')
-        self.failUnless(msg.has_key('header'))
-        self.failUnless(msg.has_key('Header'))
-        self.failUnless(msg.has_key('HEADER'))
-        self.failIf(msg.has_key('headeri'))
+        unless('header' in msg)
+        unless('Header' in msg)
+        unless('HEADER' in msg)
+        self.failIf('headerx' in msg)
 
     def test_set_param(self):
         eq = self.assertEqual
@@ -543,7 +546,7 @@
 bug demonstration
 \t12345678911234567892123456789312345678941234567895123456789612345678971234567898112345678911234567892123456789112345678911234567892123456789
 \tmore text""")
-        h = Header(hstr)
+        h = Header(hstr.replace('\t', ' '))
         eq(h.encode(), """\
 bug demonstration
  12345678911234567892123456789312345678941234567895123456789612345678971234567898112345678911234567892123456789112345678911234567892123456789
@@ -554,9 +557,20 @@
         g = Charset("iso-8859-1")
         cz = Charset("iso-8859-2")
         utf8 = Charset("utf-8")
-        g_head = "Die Mieter treten hier ein werden mit einem Foerderband komfortabel den Korridor entlang, an s\xfcdl\xfcndischen Wandgem\xe4lden vorbei, gegen die rotierenden Klingen bef\xf6rdert. "
-        cz_head = "Finan\xe8ni metropole se hroutily pod tlakem jejich d\xf9vtipu.. "
-        utf8_head = "\u6b63\u78ba\u306b\u8a00\u3046\u3068\u7ffb\u8a33\u306f\u3055\u308c\u3066\u3044\u307e\u305b\u3093\u3002\u4e00\u90e8\u306f\u30c9\u30a4\u30c4\u8a9e\u3067\u3059\u304c\u3001\u3042\u3068\u306f\u3067\u305f\u3089\u3081\u3067\u3059\u3002\u5b9f\u969b\u306b\u306f\u300cWenn ist das Nunstuck git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput.\u300d\u3068\u8a00\u3063\u3066\u3044\u307e\u3059\u3002".encode("utf-8")
+        g_head = (b'Die Mieter treten hier ein werden mit einem Foerderband '
+                  b'komfortabel den Korridor entlang, an s\xfcdl\xfcndischen '
+                  b'Wandgem\xe4lden vorbei, gegen die rotierenden Klingen '
+                  b'bef\xf6rdert. ')
+        cz_head = (b'Finan\xe8ni metropole se hroutily pod tlakem jejich '
+                   b'd\xf9vtipu.. ')
+        utf8_head = ('\u6b63\u78ba\u306b\u8a00\u3046\u3068\u7ffb\u8a33\u306f'
+                     '\u3055\u308c\u3066\u3044\u307e\u305b\u3093\u3002\u4e00'
+                     '\u90e8\u306f\u30c9\u30a4\u30c4\u8a9e\u3067\u3059\u304c'
+                     '\u3001\u3042\u3068\u306f\u3067\u305f\u3089\u3081\u3067'
+                     '\u3059\u3002\u5b9f\u969b\u306b\u306f\u300cWenn ist das '
+                     'Nunstuck git und Slotermeyer? Ja! Beiherhund das Oder '
+                     'die Flipperwaldt gersput.\u300d\u3068\u8a00\u3063\u3066'
+                     '\u3044\u307e\u3059\u3002')
         h = Header(g_head, g, header_name='Subject')
         h.append(cz_head, cz)
         h.append(utf8_head, utf8)
@@ -601,7 +615,7 @@
 wasnipoop; giraffes="very-long-necked-animals";
  spooge="yummy"; hippos="gargantuan"; marshmallows="gooey"''')
 
-    def test_long_header_encode_with_tab_continuation(self):
+    def test_long_header_encode_with_tab_continuation_is_just_a_hint(self):
         eq = self.ndiffAssertEqual
         h = Header('wasnipoop; giraffes="very-long-necked-animals"; '
                    'spooge="yummy"; hippos="gargantuan"; marshmallows="gooey"',
@@ -609,6 +623,16 @@
                    continuation_ws='\t')
         eq(h.encode(), '''\
 wasnipoop; giraffes="very-long-necked-animals";
+ spooge="yummy"; hippos="gargantuan"; marshmallows="gooey"''')
+
+    def test_long_header_encode_with_tab_continuation(self):
+        eq = self.ndiffAssertEqual
+        h = Header('wasnipoop; giraffes="very-long-necked-animals";\t'
+                   'spooge="yummy"; hippos="gargantuan"; marshmallows="gooey"',
+                   header_name='X-Foobar-Spoink-Defrobnit',
+                   continuation_ws='\t')
+        eq(h.encode(), '''\
+wasnipoop; giraffes="very-long-necked-animals";
 \tspooge="yummy"; hippos="gargantuan"; marshmallows="gooey"''')
 
     def test_header_splitter(self):
@@ -627,7 +651,7 @@
 MIME-Version: 1.0
 Content-Transfer-Encoding: 7bit
 X-Foobar-Spoink-Defrobnit: wasnipoop; giraffes="very-long-necked-animals";
-\tspooge="yummy"; hippos="gargantuan"; marshmallows="gooey"
+ spooge="yummy"; hippos="gargantuan"; marshmallows="gooey"
 
 ''')
 
@@ -635,7 +659,7 @@
         eq = self.ndiffAssertEqual
         msg = Message()
         msg['From'] = 'test at dom.ain'
-        msg['References'] = SPACE.join(['<%d at dom.ain>' % i for i in range(10)])
+        msg['References'] = SPACE.join('<%d at dom.ain>' % i for i in range(10))
         msg.set_payload('Test')
         sfp = StringIO()
         g = Generator(sfp)
@@ -643,7 +667,7 @@
         eq(sfp.getvalue(), """\
 From: test at dom.ain
 References: <0 at dom.ain> <1 at dom.ain> <2 at dom.ain> <3 at dom.ain> <4 at dom.ain>
-\t<5 at dom.ain> <6 at dom.ain> <7 at dom.ain> <8 at dom.ain> <9 at dom.ain>
+ <5 at dom.ain> <6 at dom.ain> <7 at dom.ain> <8 at dom.ain> <9 at dom.ain>
 
 Test""")
 
@@ -664,17 +688,17 @@
         h = Header(hstr, continuation_ws='\t')
         eq(h.encode(), """\
 from babylon.socal-raves.org (localhost [127.0.0.1]);
-\tby babylon.socal-raves.org (Postfix) with ESMTP id B570E51B81;
-\tfor <mailman-admin at babylon.socal-raves.org>;
-\tSat, 2 Feb 2002 17:00:06 -0800 (PST)
+ by babylon.socal-raves.org (Postfix) with ESMTP id B570E51B81;
+ for <mailman-admin at babylon.socal-raves.org>;
+ Sat, 2 Feb 2002 17:00:06 -0800 (PST)
 \tfrom babylon.socal-raves.org (localhost [127.0.0.1]);
-\tby babylon.socal-raves.org (Postfix) with ESMTP id B570E51B81;
-\tfor <mailman-admin at babylon.socal-raves.org>;
-\tSat, 2 Feb 2002 17:00:06 -0800 (PST)
+ by babylon.socal-raves.org (Postfix) with ESMTP id B570E51B81;
+ for <mailman-admin at babylon.socal-raves.org>;
+ Sat, 2 Feb 2002 17:00:06 -0800 (PST)
 \tfrom babylon.socal-raves.org (localhost [127.0.0.1]);
-\tby babylon.socal-raves.org (Postfix) with ESMTP id B570E51B81;
-\tfor <mailman-admin at babylon.socal-raves.org>;
-\tSat, 2 Feb 2002 17:00:06 -0800 (PST)""")
+ by babylon.socal-raves.org (Postfix) with ESMTP id B570E51B81;
+ for <mailman-admin at babylon.socal-raves.org>;
+ Sat, 2 Feb 2002 17:00:06 -0800 (PST)""")
 
     def test_splitting_first_line_only_is_long(self):
         eq = self.ndiffAssertEqual
@@ -687,7 +711,7 @@
                    continuation_ws='\t')
         eq(h.encode(), """\
 from modemcable093.139-201-24.que.mc.videotron.ca ([24.201.139.93]
-\thelo=cthulhu.gerg.ca)
+ helo=cthulhu.gerg.ca)
 \tby kronos.mems-exchange.org with esmtp (Exim 4.05)
 \tid 17k4h5-00034i-00
 \tfor test at mems-exchange.org; Wed, 28 Aug 2002 11:25:20 -0400""")
@@ -700,8 +724,8 @@
         h.append('gr\xfcnes Licht f\xfcr Offshore-Windkraftprojekte')
         msg['Subject'] = h
         eq(msg.as_string(), """\
-Subject: =?iso-8859-1?q?Britische_Regierung_gibt?= =?iso-8859-1?q?gr=FCnes?=
- =?iso-8859-1?q?_Licht_f=FCr_Offshore-Windkraftprojekte?=
+Subject: =?iso-8859-1?q?Britische_Regierung_gibt_gr=FCnes_Licht_f=FCr?=
+ =?iso-8859-1?q?Offshore-Windkraftprojekte?=
 
 """)
 
@@ -797,13 +821,13 @@
         eq = self.ndiffAssertEqual
         msg = Message()
         t = """\
- iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAGFBMVEUAAAAkHiJeRUIcGBi9
+iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAGFBMVEUAAAAkHiJeRUIcGBi9
  locQDQ4zJykFBAXJfWDjAAACYUlEQVR4nF2TQY/jIAyFc6lydlG5x8Nyp1Y69wj1PN2I5gzp"""
         msg['Face-1'] = t
         msg['Face-2'] = Header(t, header_name='Face-2')
         eq(msg.as_string(maxheaderlen=78), """\
 Face-1: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAGFBMVEUAAAAkHiJeRUIcGBi9
-\tlocQDQ4zJykFBAXJfWDjAAACYUlEQVR4nF2TQY/jIAyFc6lydlG5x8Nyp1Y69wj1PN2I5gzp
+ locQDQ4zJykFBAXJfWDjAAACYUlEQVR4nF2TQY/jIAyFc6lydlG5x8Nyp1Y69wj1PN2I5gzp
 Face-2: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAGFBMVEUAAAAkHiJeRUIcGBi9
  locQDQ4zJykFBAXJfWDjAAACYUlEQVR4nF2TQY/jIAyFc6lydlG5x8Nyp1Y69wj1PN2I5gzp
 
@@ -1508,6 +1532,8 @@
             (b'r\x8aksm\x9arg\x8cs', 'mac-iceland')])
         header = make_header(dh)
         eq(str(header),
+           'Re: r\xe4ksm\xf6rg\xe5s baz foo bar r\xe4ksm\xf6rg\xe5s')
+        eq(header.encode(),
            """Re: =?mac-iceland?q?r=8Aksm=9Arg=8Cs?= baz foo bar
  =?mac-iceland?q?r=8Aksm=9Arg=8Cs?=""")
 
@@ -2059,17 +2085,10 @@
         all = module.__all__[:]
         all.sort()
         self.assertEqual(all, [
-            # Old names
-            'Charset', 'Encoders', 'Errors', 'Generator',
-            'Header', 'Iterators', 'MIMEAudio', 'MIMEBase',
-            'MIMEImage', 'MIMEMessage', 'MIMEMultipart',
-            'MIMENonMultipart', 'MIMEText', 'Message',
-            'Parser', 'Utils', 'base64MIME',
-            # new names
             'base64mime', 'charset', 'encoders', 'errors', 'generator',
             'header', 'iterators', 'message', 'message_from_file',
             'message_from_string', 'mime', 'parser',
-            'quopriMIME', 'quoprimime', 'utils',
+            'quoprimime', 'utils',
             ])
 
     def test_formatdate(self):
@@ -2475,10 +2494,8 @@
         eq = self.assertEqual
         m = '>From: foo\nFrom: bar\n!"#QUX;~: zoo\n\nbody'
         msg = email.message_from_string(m)
-        eq(len(msg.keys()), 3)
-        keys = msg.keys()
-        keys.sort()
-        eq(keys, ['!"#QUX;~', '>From', 'From'])
+        eq(len(msg), 3)
+        eq(sorted(field for field in msg), ['!"#QUX;~', '>From', 'From'])
         eq(msg.get_payload(), 'body')
 
     def test_rfc2822_space_not_allowed_in_header(self):
@@ -2548,9 +2565,7 @@
         eq(he('hello\nworld'), '=?iso-8859-1?b?aGVsbG8NCndvcmxk?=')
         # Test the charset option
         eq(he('hello', charset='iso-8859-2'), '=?iso-8859-2?b?aGVsbG8=?=')
-        # Test the keep_eols flag
-        eq(he('hello\nworld', keep_eols=True),
-           '=?iso-8859-1?b?aGVsbG8Kd29ybGQ=?=')
+        eq(he('hello\nworld'), '=?iso-8859-1?b?aGVsbG8Kd29ybGQ=?=')
         # Test the maxlinelen argument
         eq(he('xxxx ' * 20, maxlinelen=40), """\
 =?iso-8859-1?b?eHh4eCB4eHh4IHh4eHggeHg=?=
@@ -2572,15 +2587,25 @@
 
 class TestQuopri(unittest.TestCase):
     def setUp(self):
-        self.hlit = [chr(x) for x in range(ord('a'), ord('z')+1)] + \
-                    [chr(x) for x in range(ord('A'), ord('Z')+1)] + \
-                    [chr(x) for x in range(ord('0'), ord('9')+1)] + \
-                    ['!', '*', '+', '-', '/', ' ']
-        self.hnon = [chr(x) for x in range(256) if chr(x) not in self.hlit]
+        # Set of characters (as byte integers) that don't need to be encoded
+        # in headers.
+        self.hlit = list(chain(
+            range(ord('a'), ord('z') + 1),
+            range(ord('A'), ord('Z') + 1),
+            range(ord('0'), ord('9') + 1),
+            (c for c in b'!*+-/ ')))
+        # Set of characters (as byte integers) that do need to be encoded in
+        # headers.
+        self.hnon = [c for c in range(256) if c not in self.hlit]
         assert len(self.hlit) + len(self.hnon) == 256
-        self.blit = [chr(x) for x in range(ord(' '), ord('~')+1)] + ['\t']
-        self.blit.remove('=')
-        self.bnon = [chr(x) for x in range(256) if chr(x) not in self.blit]
+        # Set of characters (as byte integers) that don't need to be encoded
+        # in bodies.
+        self.blit = list(range(ord(' '), ord('~') + 1))
+        self.blit.append(ord('\t'))
+        self.blit.remove(ord('='))
+        # Set of characters (as byte integers) that do need to be encoded in
+        # bodies.
+        self.bnon = [c for c in range(256) if c not in self.blit]
         assert len(self.blit) + len(self.bnon) == 256
 
     def test_header_quopri_check(self):
@@ -2597,15 +2622,24 @@
 
     def test_header_quopri_len(self):
         eq = self.assertEqual
-        hql = quoprimime.header_quopri_len
-        enc = quoprimime.header_encode
-        for s in ('hello', 'h at e@l at l@o@'):
-            # Empty charset and no line-endings.  7 == RFC chrome
-            eq(hql(s), len(enc(s, charset='', eol=''))-7)
+        eq(quoprimime.header_quopri_len(b'hello'), 5)
+        # RFC 2047 chrome is not included in header_quopri_len().
+        eq(len(quoprimime.header_encode(b'hello', charset='xxx')),
+           quoprimime.header_quopri_len(b'hello') +
+           # =?xxx?q?...?= means 10 extra characters
+           10)
+        eq(quoprimime.header_quopri_len(b'h at e@l at l@o@'), 20)
+        # RFC 2047 chrome is not included in header_quopri_len().
+        eq(len(quoprimime.header_encode(b'h at e@l at l@o@', charset='xxx')),
+           quoprimime.header_quopri_len(b'h at e@l at l@o@') +
+           # =?xxx?q?...?= means 10 extra characters
+           10)
         for c in self.hlit:
-            eq(hql(c), 1)
+            eq(quoprimime.header_quopri_len(bytes([c])), 1,
+               'expected length 1 for %r' % chr(c))
         for c in self.hnon:
-            eq(hql(c), 3)
+            eq(quoprimime.header_quopri_len(bytes([c])), 3,
+               'expected length 3 for %r' % chr(c))
 
     def test_body_quopri_len(self):
         eq = self.assertEqual
@@ -2623,28 +2657,11 @@
     def test_header_encode(self):
         eq = self.assertEqual
         he = quoprimime.header_encode
-        eq(he('hello'), '=?iso-8859-1?q?hello?=')
-        eq(he('hello\nworld'), '=?iso-8859-1?q?hello=0D=0Aworld?=')
-        # Test the charset option
-        eq(he('hello', charset='iso-8859-2'), '=?iso-8859-2?q?hello?=')
-        # Test the keep_eols flag
-        eq(he('hello\nworld', keep_eols=True), '=?iso-8859-1?q?hello=0Aworld?=')
+        eq(he(b'hello'), '=?iso-8859-1?q?hello?=')
+        eq(he(b'hello', charset='iso-8859-2'), '=?iso-8859-2?q?hello?=')
+        eq(he(b'hello\nworld'), '=?iso-8859-1?q?hello=0Aworld?=')
         # Test a non-ASCII character
-        eq(he('hello\xc7there'), '=?iso-8859-1?q?hello=C7there?=')
-        # Test the maxlinelen argument
-        eq(he('xxxx ' * 20, maxlinelen=40), """\
-=?iso-8859-1?q?xxxx_xxxx_xxxx_xxxx_xx?=
- =?iso-8859-1?q?xx_xxxx_xxxx_xxxx_xxxx?=
- =?iso-8859-1?q?_xxxx_xxxx_xxxx_xxxx_x?=
- =?iso-8859-1?q?xxx_xxxx_xxxx_xxxx_xxx?=
- =?iso-8859-1?q?x_xxxx_xxxx_?=""")
-        # Test the eol argument
-        eq(he('xxxx ' * 20, maxlinelen=40, eol='\r\n'), """\
-=?iso-8859-1?q?xxxx_xxxx_xxxx_xxxx_xx?=\r
- =?iso-8859-1?q?xx_xxxx_xxxx_xxxx_xxxx?=\r
- =?iso-8859-1?q?_xxxx_xxxx_xxxx_xxxx_x?=\r
- =?iso-8859-1?q?xxx_xxxx_xxxx_xxxx_xxx?=\r
- =?iso-8859-1?q?x_xxxx_xxxx_?=""")
+        eq(he(b'hello\xc7there'), '=?iso-8859-1?q?hello=C7there?=')
 
     def test_decode(self):
         eq = self.assertEqual
@@ -2705,21 +2722,21 @@
         eq = self.assertEqual
         # Try a charset with QP body encoding
         c = Charset('iso-8859-1')
-        eq('hello w=F6rld', c.body_encode('hello w\xf6rld'))
+        eq('hello w=F6rld', c.body_encode(b'hello w\xf6rld'))
         # Try a charset with Base64 body encoding
         c = Charset('utf-8')
-        eq('aGVsbG8gd29ybGQ=\n', c.body_encode('hello world'))
+        eq('aGVsbG8gd29ybGQ=\n', c.body_encode(b'hello world'))
         # Try a charset with None body encoding
         c = Charset('us-ascii')
-        eq('hello world', c.body_encode('hello world'))
+        eq('hello world', c.body_encode(b'hello world'))
         # Try the convert argument, where input codec != output codec
         c = Charset('euc-jp')
         # With apologies to Tokio Kikuchi ;)
         try:
             eq('\x1b$B5FCO;~IW\x1b(B',
-               c.body_encode('\xb5\xc6\xc3\xcf\xbb\xfe\xc9\xd7'))
+               c.body_encode(b'\xb5\xc6\xc3\xcf\xbb\xfe\xc9\xd7'))
             eq('\xb5\xc6\xc3\xcf\xbb\xfe\xc9\xd7',
-               c.body_encode('\xb5\xc6\xc3\xcf\xbb\xfe\xc9\xd7', False))
+               c.body_encode(b'\xb5\xc6\xc3\xcf\xbb\xfe\xc9\xd7', False))
         except LookupError:
             # We probably don't have the Japanese codecs installed
             pass
@@ -2729,7 +2746,7 @@
         from email import charset as CharsetModule
         CharsetModule.add_charset('fake', CharsetModule.QP, None)
         c = Charset('fake')
-        eq('hello w\xf6rld', c.body_encode('hello w\xf6rld'))
+        eq('hello w\xf6rld', c.body_encode(b'hello w\xf6rld'))
 
     def test_unicode_charset_name(self):
         charset = Charset('us-ascii')
@@ -2769,9 +2786,20 @@
         g = Charset("iso-8859-1")
         cz = Charset("iso-8859-2")
         utf8 = Charset("utf-8")
-        g_head = "Die Mieter treten hier ein werden mit einem Foerderband komfortabel den Korridor entlang, an s\xfcdl\xfcndischen Wandgem\xe4lden vorbei, gegen die rotierenden Klingen bef\xf6rdert. "
-        cz_head = "Finan\xe8ni metropole se hroutily pod tlakem jejich d\xf9vtipu.. "
-        utf8_head = "\u6b63\u78ba\u306b\u8a00\u3046\u3068\u7ffb\u8a33\u306f\u3055\u308c\u3066\u3044\u307e\u305b\u3093\u3002\u4e00\u90e8\u306f\u30c9\u30a4\u30c4\u8a9e\u3067\u3059\u304c\u3001\u3042\u3068\u306f\u3067\u305f\u3089\u3081\u3067\u3059\u3002\u5b9f\u969b\u306b\u306f\u300cWenn ist das Nunstuck git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput.\u300d\u3068\u8a00\u3063\u3066\u3044\u307e\u3059\u3002".encode("utf-8")
+        g_head = (b'Die Mieter treten hier ein werden mit einem '
+                  b'Foerderband komfortabel den Korridor entlang, '
+                  b'an s\xfcdl\xfcndischen Wandgem\xe4lden vorbei, '
+                  b'gegen die rotierenden Klingen bef\xf6rdert. ')
+        cz_head = (b'Finan\xe8ni metropole se hroutily pod tlakem jejich '
+                   b'd\xf9vtipu.. ')
+        utf8_head = ('\u6b63\u78ba\u306b\u8a00\u3046\u3068\u7ffb\u8a33\u306f'
+                     '\u3055\u308c\u3066\u3044\u307e\u305b\u3093\u3002\u4e00'
+                     '\u90e8\u306f\u30c9\u30a4\u30c4\u8a9e\u3067\u3059\u304c'
+                     '\u3001\u3042\u3068\u306f\u3067\u305f\u3089\u3081\u3067'
+                     '\u3059\u3002\u5b9f\u969b\u306b\u306f\u300cWenn ist das '
+                     'Nunstuck git und Slotermeyer? Ja! Beiherhund das Oder '
+                     'die Flipperwaldt gersput.\u300d\u3068\u8a00\u3063\u3066'
+                     '\u3044\u307e\u3059\u3002')
         h = Header(g_head, g)
         h.append(cz_head, cz)
         h.append(utf8_head, utf8)
@@ -2813,6 +2841,10 @@
         newh = make_header(decode_header(enc))
         eq(newh, enc)
 
+    def test_empty_header_encode(self):
+        h = Header()
+        self.assertEqual(h.encode(), '')
+        
     def test_header_ctor_default_args(self):
         eq = self.ndiffAssertEqual
         h = Header()
@@ -2822,17 +2854,56 @@
 
     def test_explicit_maxlinelen(self):
         eq = self.ndiffAssertEqual
-        hstr = 'A very long line that must get split to something other than at the 76th character boundary to test the non-default behavior'
+        hstr = ('A very long line that must get split to something other '
+                'than at the 76th character boundary to test the non-default '
+                'behavior')
         h = Header(hstr)
         eq(h.encode(), '''\
 A very long line that must get split to something other than at the 76th
  character boundary to test the non-default behavior''')
+        eq(str(h), hstr)
         h = Header(hstr, header_name='Subject')
         eq(h.encode(), '''\
 A very long line that must get split to something other than at the
  76th character boundary to test the non-default behavior''')
+        eq(str(h), hstr)
         h = Header(hstr, maxlinelen=1024, header_name='Subject')
         eq(h.encode(), hstr)
+        eq(str(h), hstr)
+
+    def test_long_splittables_with_trailing_spaces(self):
+        eq = self.ndiffAssertEqual
+        h = Header(charset='iso-8859-1', maxlinelen=20)
+        h.append('xxxx ' * 20)
+        eq(h.encode(), """\
+=?iso-8859-1?q?xxxx?=
+ =?iso-8859-1?q?xxxx?=
+ =?iso-8859-1?q?xxxx?=
+ =?iso-8859-1?q?xxxx?=
+ =?iso-8859-1?q?xxxx?=
+ =?iso-8859-1?q?xxxx?=
+ =?iso-8859-1?q?xxxx?=
+ =?iso-8859-1?q?xxxx?=
+ =?iso-8859-1?q?xxxx?=
+ =?iso-8859-1?q?xxxx?=
+ =?iso-8859-1?q?xxxx?=
+ =?iso-8859-1?q?xxxx?=
+ =?iso-8859-1?q?xxxx?=
+ =?iso-8859-1?q?xxxx?=
+ =?iso-8859-1?q?xxxx?=
+ =?iso-8859-1?q?xxxx?=
+ =?iso-8859-1?q?xxxx?=
+ =?iso-8859-1?q?xxxx?=
+ =?iso-8859-1?q?xxxx?=
+ =?iso-8859-1?q?xxxx_?=""")
+        h = Header(charset='iso-8859-1', maxlinelen=40)
+        h.append('xxxx ' * 20)
+        eq(h.encode(), """\
+=?iso-8859-1?q?xxxx_xxxx_xxxx_xxxx?=
+ =?iso-8859-1?q?xxxx_xxxx_xxxx_xxxx?=
+ =?iso-8859-1?q?xxxx_xxxx_xxxx_xxxx?=
+ =?iso-8859-1?q?xxxx_xxxx_xxxx_xxxx?=
+ =?iso-8859-1?q?xxxx_xxxx_xxxx_xxxx_?=""")
 
     def test_us_ascii_header(self):
         eq = self.assertEqual
@@ -2867,13 +2938,14 @@
     def test_bad_8bit_header(self):
         raises = self.assertRaises
         eq = self.assertEqual
-        x = 'Ynwp4dUEbay Auction Semiar- No Charge \x96 Earn Big'
+        x = b'Ynwp4dUEbay Auction Semiar- No Charge \x96 Earn Big'
         raises(UnicodeError, Header, x)
         h = Header()
         raises(UnicodeError, h.append, x)
-        eq(str(Header(x, errors='replace')), x)
+        e = x.decode('utf-8', 'replace')
+        eq(str(Header(x, errors='replace')), e)
         h.append(x, errors='replace')
-        eq(str(h), x)
+        eq(str(h), e)
 
     def test_encoded_adjacent_nonencoded(self):
         eq = self.assertEqual
@@ -2938,7 +3010,7 @@
 Subject: This is a test message
 Date: Fri, 4 May 2001 14:05:44 -0400
 Content-Type: text/plain; charset=us-ascii;
-\ttitle*="us-ascii'en'This%20is%20even%20more%20%2A%2A%2Afun%2A%2A%2A%20isn%27t%20it%21"
+ title*="us-ascii'en'This%20is%20even%20more%20%2A%2A%2Afun%2A%2A%2A%20isn%27t%20it%21"
 
 
 Hi,
@@ -2968,7 +3040,7 @@
 Subject: This is a test message
 Date: Fri, 4 May 2001 14:05:44 -0400
 Content-Type: text/plain; charset="us-ascii";
-\ttitle*="us-ascii'en'This%20is%20even%20more%20%2A%2A%2Afun%2A%2A%2A%20isn%27t%20it%21"
+ title*="us-ascii'en'This%20is%20even%20more%20%2A%2A%2Afun%2A%2A%2A%20isn%27t%20it%21"
 
 
 Hi,


More information about the Python-checkins mailing list