help with unicode email parse

Thu Sep 7 07:08:32 EDT 2006

neoedmund wrote:
> i want to get the subject from email and construct a filename with the
> subject.
> but tried a lot, always got error like this:
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 4:
> ordinal not in range(128)
>
>
> 	msg = email.message_from_string( text )
> 	title = decode_header( msg["Subject"] )
> 	title= title[0][0]
> 	#title=title.encode("utf8")

Why is that commented out?

> 	print title
> 	fn = ""+path+"/"+stamp+"-"+title+".mail"
>
>
> the variable "text"  come from sth like this:
> ( header, msg, octets ) = a.retr( i )
> text= list2txt( msg )
> def list2txt( l ):
> 	return reduce( lambda x, y:x+"\r\n"+y, l )
>
> anyone can help me out? thanks.

Not without a functional crystal ball.

You could help yourself considerably by (1) working out which line of
code the problem occurs in [the traceback will tell you that] (2)
working out which string is being decoded into Unicode, and has '\xe9'
as its 5th byte. Either that string needs to be decoded using something
like 'latin1' [should be specified in the message headers] rather than
the default 'ascii', or the code has a deeper problem ...

If you can't work it out for yourself, show us the exact code that ran,
together with the traceback. If (for example) title is the problem,
insert code like:
    print 'title=', repr(title)
and include that in your next post as well.

HTH,
John