[Mailman-Users] UnicodeDecodeError with Mailman 2.1 and Python 2.6

David Magda dmagda at ee.ryerson.ca
Tue Sep 1 19:26:12 CEST 2015


[Actually send the reply to the list as well.]

On Tue, September 1, 2015 12:15, Stephen J. Turnbull wrote:
> David Magda writes:
>
>  > When I run 'bin/arch mylistname' I get the following:
>  >
>  > [...]
>  > figuring article archives
>  > 2005-October
>  > /usr/local/mailman-2.1.20/Mailman/Archiver/HyperDatabase.py:176:
>  > UnicodeWarning: Unicode equal comparison failed to convert both
arguments
>  > to Unicode - interpreting them as being unequal
>  >  self.dict = marshal.load(fp)
>  > /usr/local/mailman-2.1.20/Mailman/Archiver/HyperDatabase.py:74:
>  > UnicodeWarning: Unicode equal comparison failed to convert both
arguments
>  > to Unicode - interpreting them as being unequal
>  >  self.sorted.sort()
>  > Updating index files for archive [2004-December]
>  > [...]
>  > Updating HTML for article 214
>  > Pickling archive state into
>  > /usr/local/mailman-2.1.20/archives/private/mylistname/pipermail.pck
>  > Traceback (most recent call last):
>
> It would appear that you have non-ASCII character in the header of the
> 214th message of December 2004 (or maybe it's the 214th message
> overall).  That message doesn't conform to the mail standards and
> should be repaired.
>
> Since pipermail is constructing an index, I would guess that you have
> a localized date header, a display name with an accented character in
> it, or a subject with an accented character in it.  The character in
> question is e with a caret in the Latin-1 set, I don't know if that's
> the intended character set though.

Looking at the mbox, there was only one place where \xea was in the
header, in a Subject line, using `grep --color='auto' -P -n "\xea"`. I
manually edited the mbox (making a copy first) and remove the accented-e
character with an ASCII "e", and I'm still getting the error (I did this
before e-mail the list). There are other places which have \xea, but not
in any headers.

The 214 is the message count from a state file. Every time I rerun the
command the number is higher, but it seems to die in the same place. In
the middle of the output we have a "UnicodeWarning":

[...]
#00104 <... at acm.org>
figuring article archives
2005-September
#00105 <... at mikep>
figuring article archives
2005-September
#00106 <... at acm.org>
figuring article archives
2005-September
#00107 <... at mail.gmail.com>
figuring article archives
2005-October
/usr/local/mailman-2.1.20/Mailman/Archiver/HyperDatabase.py:176:
UnicodeWarning: Unicode equal comparison failed to convert both arguments
to Unicode - interpreting them as being unequal
  self.dict = marshal.load(fp)
/usr/local/mailman-2.1.20/Mailman/Archiver/HyperDatabase.py:74:
UnicodeWarning: Unicode equal comparison failed to convert both arguments
to Unicode - interpreting them as being unequal
  self.sorted.sort()
Updating index files for archive [2004-December]
  Date
  Subject
  Author
  Thread
Computing threaded index
Updating HTML for article 757
Updating HTML for article 864
Updating HTML for article 866
Updating index files for archive [2005-April]
  Date
[...]

Then the error at the end:

[...]
Updating index files for archive [2005-August]
[...]
Updating HTML for article 947
Updating HTML for article 840
Updating index files for archive [2005-September]
  Date
  Subject
  Author
  Thread
Computing threaded index
Updating HTML for article 841
Updating HTML for article 842
Updating HTML for article 843
Updating HTML for article 952
Updating HTML for article 845
Updating HTML for article 966
Updating HTML for article 846
Updating HTML for article 847
Updating HTML for article 848
Updating HTML for article 957
Updating HTML for article 958
Updating HTML for article 961
Updating HTML for article 962
Updating HTML for article 963
Updating HTML for article 964
Updating HTML for article 965
Updating HTML for article 851
Updating HTML for article 960
Updating HTML for article 861
Updating HTML for article 859
Updating HTML for article 860
Updating HTML for article 970
Pickling archive state into
/usr/local/mailman-2.1.20/archives/private/reactome-help/pipermail.pck
Traceback (most recent call last):
  File "bin/arch", line 201, in <module>
    main()
  File "bin/arch", line 189, in main
    archiver.processUnixMailbox(fp, start, end)
  File "/usr/local/mailman-2.1.20/Mailman/Archiver/pipermail.py", line
586, in processUnixMailbox
    self.add_article(a)
  File "/usr/local/mailman-2.1.20/Mailman/Archiver/pipermail.py", line
638, in add_article
    article.parentID = parentID = self.get_parent_info(arch, article)
  File "/usr/local/mailman-2.1.20/Mailman/Archiver/pipermail.py", line
658, in get_parent_info
    if self.database.hasArticle(archive, article.in_reply_to):
  File "/usr/local/mailman-2.1.20/Mailman/Archiver/HyperDatabase.py", line
279, in hasArticle
    self.__openIndices(archive)
  File "/usr/local/mailman-2.1.20/Mailman/Archiver/HyperDatabase.py", line
257, in __openIndices
    t = DumbBTree(os.path.join(arcdir, archive + '-' + i))
  File "/usr/local/mailman-2.1.20/Mailman/Archiver/HyperDatabase.py", line
66, in __init__
    self.load()
  File "/usr/local/mailman-2.1.20/Mailman/Archiver/HyperDatabase.py", line
185, in load
    self.__sort(dirty=1)
  File "/usr/local/mailman-2.1.20/Mailman/Archiver/HyperDatabase.py", line
74, in __sort
    self.sorted.sort()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xea in position 3:
ordinal not in range(128)

It gets to 2005-September, creates the index files, enumerates 22
articles, pickles the archive state, and then dies. The 23rd (and later)
message/s don't appear to have non-ASCII.

Can I patched pipermail.py or HyperDatabase.py (or ???) in some way to
work around this? I have LANG=en_US.UTF-8 and LC_TIME=en_DK.UTF8 in my
shell environment: does that make a difference?

This used to work just fine, so I'm wonder what happened with the OS
upgrade. I should have a copy of the VM pre-upgrade in case that's
helpful.

Thanks for the help.





More information about the Mailman-Users mailing list