help with unicode email parse

neoedmund neoedmund at gmail.com
Thu Sep 7 22:49:37 EDT 2006


thank you John and Diez.
i found
fn = "%s/%s-%s.mail"%("d:/mail", "12345", '\xe6\xb5\x8b\xe8\xaf\x95' )
is ok
fn = "%s/%s-%s.mail"%(u"d:/mail", "12345", '\xe6\xb5\x8b\xe8\xaf\x95' )
results:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0:
ordinal not in range(128)
So "str"%(param) not accept unicode, only accept byte array?


John Machin wrote:
> neoedmund wrote:
> > i want to get the subject from email and construct a filename with the
> > subject.
> > but tried a lot, always got error like this:
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 4:
> > ordinal not in range(128)
> >
> >
> > 	msg = email.message_from_string( text )
> > 	title = decode_header( msg["Subject"] )
> > 	title= title[0][0]
> > 	#title=title.encode("utf8")
>
> Why is that commented out?
>
> > 	print title
> > 	fn = ""+path+"/"+stamp+"-"+title+".mail"
> >
> >
> > the variable "text"  come from sth like this:
> > ( header, msg, octets ) = a.retr( i )
> > text= list2txt( msg )
> > def list2txt( l ):
> > 	return reduce( lambda x, y:x+"\r\n"+y, l )
> >
> > anyone can help me out? thanks.
>
> Not without a functional crystal ball.
>
> You could help yourself considerably by (1) working out which line of
> code the problem occurs in [the traceback will tell you that] (2)
> working out which string is being decoded into Unicode, and has '\xe9'
> as its 5th byte. Either that string needs to be decoded using something
> like 'latin1' [should be specified in the message headers] rather than
> the default 'ascii', or the code has a deeper problem ...
>
> If you can't work it out for yourself, show us the exact code that ran,
> together with the traceback. If (for example) title is the problem,
> insert code like:
>     print 'title=', repr(title)
> and include that in your next post as well.
> 
> HTH,
> John




More information about the Python-list mailing list