Processing text data with different encodings

Chris Angelico rosuav at gmail.com
Tue Jun 28 07:09:42 EDT 2016


On Tue, Jun 28, 2016 at 8:37 PM, Michael Welle <mwe012008 at gmx.net> wrote:
> Steven D'Aprano <steve at pearwood.info> writes:
>
>> On Tue, 28 Jun 2016 06:35 pm, Michael Welle wrote:
>>
>>> my original data is email. The mail header says it's utf-8, but you will
>>> find three or four different encodings in one email. I think at the
>>> sending side they just glue different text fragments from different
>>> sources together without thinking about the encoding.
>>
>> Is this spam? In my experience, the only email that is that badly
>> constructed is spam. I can't imagine how it could be email from a person,
>> coming from a mail client like Thunderbird or Outlook.
> it's mail from an international company. It's not generated by a person
> using an ordinary email client. Other than that your are right ;).

So..... buggy commercial email. Great. Just brilliant.

Can you bill them for your time developing this hack?

ChrisA



More information about the Python-list mailing list