[Mailman-Developers] unicode / archive problem revisited

Ron Brogden rb at islandnet.com
Tue Dec 3 02:15:10 2002


Howdy.  I am currently running 2.1b5 of Mailman and am trying to sort out an 
issue with archiving that has crept up.  

The problem has been mentioned previously from what I can tell but no 
resolution seems to have been mentioned.

What the problem is that list archives (for reasons I won't bore you with) 
have a number of SPAM message in them with all sorts of random encoding types 
and other mangled garbage.  What happens is that when the archiver gets to 
the point of writing the archive, the encoding type test generates an error 
and the whole archiving process grinds to a crashing halt.  These are busy 
lists and the mbox archive takes a very long time to parse and there just is 
not enough time in the day to search for the offending message, chop it out 
and wait another 45 minutes or more until the archives are regenerated to hit 
the next garbled header, etc.  This will also continue to be a problem if any 
future SPAM messages sneak in via forged headers, etc.

The issue appears to be with: 

/usr/local/mailman/Mailman/Archiver/HyperArch.py 

Traceback (most recent call last):
  File "./bin/arch", line 173, in ?
    main()
  File "./bin/arch", line 163, in main
    archiver.close()
  File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 303, in close
    self.update_dirty_archives()
  File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 517, in 
update_dirty_archives
    self.update_archive(i)
  File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 1058, in 
update_archive
    self.__super_update_archive(archive)
  File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 423, in 
update_archive
    self._update_simple_index(hdr, archive, arcdir)
  File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 444, in 
_update_simple_index
    self.write_index_entry(article)
  File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 980, in 
write_index_entry
    subject = self.get_header("subject", article)
  File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 1007, in 
get_header
    return unicode(result, article.charset)
TypeError: unicode() argument 2 must be string, not None

What I want is the archiver to default to english if it cannot figure out the 
encoding so that at least the archiver will not die.  

So two questions:

What is a valid encoding type to pass as default to the unicode call?  
Secondly, is there any danger in changing the fallback option to always use a 
specific charset?  I'd rather have gibberish than a process that dies.

Basically, around line 1007 in 
"/usr/local/mailman/Mailman/Archiver/HyperArch.py" I want to change:

 if isinstance(result, types.UnicodeType):
            return result
 try:
            return unicode(result, article.charset)

to

 if isinstance(result, types.UnicodeType):
            return result
 try:
            return unicode(result, "some string") # never fail!

Thanks for any suggestions.

Cheers



More information about the Mailman-Developers mailing list