[Mailman-Users] Mime conversions - missing carriage returns andoddcharacters

Ryan Steele steele at agora-net.com
Mon Mar 5 19:15:04 CET 2007


Mark Sapiro wrote:
> Ryan Steele wrote:
>
>   
>> Mark Sapiro wrote:
>>     
>>> Ryan Steele wrote:
>>>   
>>>       
>>>> Mark Sapiro wrote:
>>>>     
>>>>         
>>>>> Do you have convert_html_to_plaintext set to Yes?
>>>>>   
>>>>>       
>>>>>           
>>>> Affirmative, I do.
>>>>     
>>>>         
>>> Based on what I see, I don't think that is the problem.
>>>       
>
>
> It definitely IS the issue with this one. 
>   
>> I'm still working on getting the client to be able to consistently 
>> reproduce the character misrepresentation (UTF-8) issue, so I won't 
>> bother the list with that for now.  However, here's a good example of 
>> one with stripped carriage returns.  Sorry for the delay.  I didn't post 
>> this to the list because I didn't want a search engine spider crawling 
>> the information in the header... feel free to post a reply with those 
>> snipped!
>>
>> Before making it to the list (carriage return omission example):
>>
>> ######################################################################################
>>
>>     
> <snip>
>   
>> X-Mailer: QUALCOMM Windows Eudora Version 7.1.0.9
>> Content-type: text/html; charset=us-ascii
>>     
> <snip>
>   
>> <html>
>> <body>
>> Hello, Ryan<br><br>
>> I am writing each line flush left<br><br>
>> And, I have inserted a CR between each line<br><br>
>> I am copying you on the original<br><br>
>> And sending a copy to test2<br><br>
>> I think you'll see that the text shifts 2 spaces to the right<br><br>
>> And that all the CRs are deleted<br><br>
>> And that urls get footnoted.<br><br>
>> For instance,
>>
>> <a href="http://www.odnetwork.org/" eudora="autourl">www.odnetwork.org<br>
>> <br>
>> </a>Plus the url in my sig block<br><br>
>> I am typing bold in <b>bold </b>and italics in <i>italics<br><br>
>> </i>I hope that helps.<br><br>
>> Bill you will also get a copy of this, which you can ignore.<br><br>
>> Matt<br>
>>
>>     
> <snip>
> </body>
>   
>> </html>
>>
>> ######################################################################################
>>
>> After making it to the list (carriage return omission example):
>>
>> ######################################################################################
>>
>>     
> <snip>
>   
>> X-Content-Filtered-By: Mailman/MimeDel 2.1.8
>>     
> <snip>
>   
>> Content-Type: text/plain; charset="us-ascii"
>> Content-Transfer-Encoding: 7bit
>>     
> <snip>
>   
>>
>>   Hello, Ryan
>>   I am writing each line flush left
>>   And, I have inserted a CR between each line
>>   I am copying you on the original
>>   And sending a copy to test2
>>   I think you'll see that the text shifts 2 spaces to the right
>>   And that all the CRs are deleted
>>   And that urls get footnoted.
>>   For instance, [1]www.odnetwork.org
>>   Plus the url in my sig block
>>   I am typing bold in bold and italics in italics
>>   I hope that helps.
>>   Bill you will also get a copy of this, which you can ignore.
>>   Matt
>>
>>     
> <snip>
>   
>> References
>>
>>   1. http://www.odnetwork.org/
>>     
> <snip>
>
>
> The original post was html only, thus it was converted to plain text.
> The indentation and footnoting of hyperlinks is all done by your
> HTML_TO_PLAIN_TEXT_COMMAND (default = '/usr/bin/lynx -dump
> %(filename)s'). I'm not sure what 'carriage returns' are missing, but
> if you're referring to the html that renders as double spaces being
> renderd single spaced, that's lynx too. If you prefer a double spaced
> rendering, you can put
>
> HTML_TO_PLAIN_TEXT_COMMAND = '/usr/bin/links -dump %(filename)s'
>
> in mm_cfg.py to replace lynx with links (verify that you have links and
> that is the correct path). The rest of what links does with this is
> very similar to what lynx does.
>   

I do have links, and I can confirm that is the correct path (at least on 
my Debian Sarge boxes).  It appears that his Eudora client is configured 
to send in html, and that the two carriage returns after each line are 
being written as two break tags; I understand why that would render as a 
single carriage return in that regard.  Nonetheless, I will check out 
links - thank you for the suggestion.

> If in the other cases, we are converting utf-8 html to plain text, I
> think that explains why we 'lose' the character set. I think there are
> definitely problems in this area. It didn't look like that was the
> issue from a previous post, but as the messages were incomplete in
> that example, I may have misinterpreted what was happening.
>
>   

There are other users from Europe whose character sets are UTF-8 and who 
are sending in HTML to boot (like one of the messages I posted before), 
so that was happening (I believe) for those users - but as I mentioned 
before, I want to get something I can consistently reproduce before 
bothering the list with any of that.  Bugs that can't be reproduced are 
of no help IMHO, and I'm sure in the eyes of most developers as well.  I 
will try to get an example of this for you where I've been CC'ed in 
addition to the list, if I can.

> If your client would post plain text or even multipart/alternative, we
> wouldn't have to convert the html.
>
>   

True; the problem is there is a very large user base for the lists in 
question and very few are the technical types (read: would not be able 
to, under any circumstance, change the output style to something 
non-HTML in their MUA), so it seems like an unfortunate but necessary 
evil. 

> See <http://www.expita.com/nomime.html#eudora5> (hopefully it also
> applies to Eudora 7.)
>
>   
Thanks for the reference on that.  I appreciate all of your advice and time.

Best Regards,
Ryan

-- 
Ryan Steele                         
Systems Administrator               steele at agora-net.com
AgoraNet, Inc.                      (302) 224-2475
314 E. Main Street, Suite 1         (302) 224-2552 (fax)
Newark, DE 19711                    http://www.agora-net.com



More information about the Mailman-Users mailing list