Base64 Decoding

Thu Feb 10 08:48:32 EST 2000

Victor M. J. Ryden:
 |I'm trying to use python to decode some e-amil I'm getting which
 |contain MIME base64 encoded documents.
 |
 |I looked at the doc's for the mimetools and base64 modules. I'm confused
 |because they ask for both an input AND output file name. I can understand
 |the input, but not the output. There can be several encoded documents in
 |a single input file, all of which have their respective file names as part
 |of the encoding prelude.
 |
 |Has anyone any experience with this, and do you have any hints?

This should get you reasonably close.  

Attached is a script I use to munge Novel Groupwise mails that I get here
occasionally.  It takes a mail message and emits a modified mail message.
In particular, it parses out application/wordperfect attachments (in base64
or non-standard X-uuencode format), decodes them, and inserts both
plaintext and base64 attachments in the message.  It also handles nested
mail messages.

This saves me from having to pop up Word Perfect to read my mail.

-- 
Randall Hopper
aa8vb at yahoo.com
-------------- next part --------------
#!/usr/local/bin/python
#
#  wpmail-cvt  - Converts broken Novel Groupwise X-uuencode mail containing 
#                WordPerfect attachments into true MIME format with base64
#                encoding and content type of application/wordperfect.
#                Also, kicks off WordPerfect, converts the document to
#                text, and inserts that as an attachment in the mail
#                message so we don't have to fire up WordPerfect.
#
#  UPDATES
#    1/19/00 - Now supports nested (forwarded) rfc822 messages
#

WPD_TO_TEXT_CMD = "wp2txt"

import sys, rfc822, multifile, cgi, base64, uu, string, os, StringIO, getopt

def HandleMessage( infile ):

  #
  # Nuke the Lines and Content-Length headers, and print the header
  #
  header = rfc822.Message(infile)
  del header[ 'Lines' ]
  del header[ 'Content-Length' ]

  # Make sure just one blank line after header
  hdr_str = str(header)
  if hdr_str[-1:] != '\n': hdr_str = hdr_str + '\n'
  print "%s%s" % ( header.unixfrom, hdr_str )

  #
  # Parse the message parts
  #
  type, params = cgi.parse_header(header["content-type"])

  if type[:10] != "multipart/":
      sys.stdout.write( infile.read() )
  else:
      boundary = params["boundary"]
      file     = multifile.MultiFile(infile, 0)
      file.push(boundary)
      while file.next():
          subheader = rfc822.Message(file)
          type, params = cgi.parse_header(subheader['content-type'])
          buf = StringIO.StringIO( file.read() )

          # Grab the filename
          filename = params.get( 'name' )
          if not filename: filename = "file"

          # If this is one of those non-MIME X-uuencode jobs, convert to base64
          if subheader.get( 'content-transfer-encoding' ) == 'X-uuencode':
              buf2 = StringIO.StringIO()
              uu.decode( buf, buf2 )
              buf = StringIO.StringIO( base64.encodestring( buf2.getvalue() ) )
              subh_str = ( 'Content-Type: application/wordperfect; name="%s"\n'
                           'Content-Transfer-Encoding: base64\n'
                           'Content-Disposition: attachment; filename="%s"\n' )\
                         % ( filename, filename )
              if subheader.get( 'content-description' ):
                  subh_str = subh_str + 'Content-Description: %s\n' % \
                              subheader.get( 'content-description' )
          else:
              subh_str = str(subheader)

          # If this is a WordPerfect doc, convert it to text and insert
          subheader = rfc822.Message( StringIO.StringIO( subh_str ) )
          type, params = cgi.parse_header(subheader['content-type'])
          if type == 'application/wordperfect':

            # Decode the data, convert to text, and reinsert
            base64.decode( buf, open( "/tmp/tmpfile", "wb" ) )
            os.system( "%s %s %s" %
                       ( WPD_TO_TEXT_CMD, "/tmp/tmpfile", "/tmp/tmpfile.txt" ) )

            print "--%s" % boundary
            print ( "Content-Type: text/plain; name=\"%s.txt\"\n"
                    "Content-Transfer-Encoding: 7bit\n"
                    "Content-Disposition: attachment; filename=\"%s.txt\"\n"
                    "Content-Description: Text\n" % ( filename, filename ) )
            print open( "/tmp/tmpfile.txt", "r" ).read(),

          print "--%s" % boundary
          print subh_str

          if subheader.get( 'content-type' ) == 'message/rfc822':
            HandleMessage( buf )
          else:
            print buf.getvalue(),
      file.pop()
      print "--%s--\n" % boundary

#
# Parse args
#
opts, args = getopt.getopt( sys.argv[1:], 'h' )
if opts or len( args ) > 1:
  print ( "FORMAT: wpmail-cvt [<filaname>]\n\n"
          "  <filename> - A file containing a mail message\n\n"
          "  If no filename is specified, input is taken from stdin" )
  sys.exit(1)

#
# Parse the e-mail header, and print it
#
if args: infile  = open( args[0] )
else:    infile  = sys.stdin

HandleMessage( infile )