[Mailman-Developers] unicode / archive problem revisited
Ron Brogden
rb at islandnet.com
Tue Dec 3 02:15:10 2002
Howdy. I am currently running 2.1b5 of Mailman and am trying to sort out an
issue with archiving that has crept up.
The problem has been mentioned previously from what I can tell but no
resolution seems to have been mentioned.
What the problem is that list archives (for reasons I won't bore you with)
have a number of SPAM message in them with all sorts of random encoding types
and other mangled garbage. What happens is that when the archiver gets to
the point of writing the archive, the encoding type test generates an error
and the whole archiving process grinds to a crashing halt. These are busy
lists and the mbox archive takes a very long time to parse and there just is
not enough time in the day to search for the offending message, chop it out
and wait another 45 minutes or more until the archives are regenerated to hit
the next garbled header, etc. This will also continue to be a problem if any
future SPAM messages sneak in via forged headers, etc.
The issue appears to be with:
/usr/local/mailman/Mailman/Archiver/HyperArch.py
Traceback (most recent call last):
File "./bin/arch", line 173, in ?
main()
File "./bin/arch", line 163, in main
archiver.close()
File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 303, in close
self.update_dirty_archives()
File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 517, in
update_dirty_archives
self.update_archive(i)
File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 1058, in
update_archive
self.__super_update_archive(archive)
File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 423, in
update_archive
self._update_simple_index(hdr, archive, arcdir)
File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 444, in
_update_simple_index
self.write_index_entry(article)
File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 980, in
write_index_entry
subject = self.get_header("subject", article)
File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 1007, in
get_header
return unicode(result, article.charset)
TypeError: unicode() argument 2 must be string, not None
What I want is the archiver to default to english if it cannot figure out the
encoding so that at least the archiver will not die.
So two questions:
What is a valid encoding type to pass as default to the unicode call?
Secondly, is there any danger in changing the fallback option to always use a
specific charset? I'd rather have gibberish than a process that dies.
Basically, around line 1007 in
"/usr/local/mailman/Mailman/Archiver/HyperArch.py" I want to change:
if isinstance(result, types.UnicodeType):
return result
try:
return unicode(result, article.charset)
to
if isinstance(result, types.UnicodeType):
return result
try:
return unicode(result, "some string") # never fail!
Thanks for any suggestions.
Cheers
More information about the Mailman-Developers
mailing list