[PyPy-issue] [issue541] bz2.decompress returns partially garbled content

Christian pypy-dev-issue at codespeak.net
Wed May 26 12:39:09 CEST 2010


New submission from Christian <wvapex at gmx.de>:

Test set is the first bz2 block (as produced by bzip2recover) from a recent dump 
of the english wikipedia.

Test case:
  import bz2
  print len(bz2.decompress(open("rec00001enwiki-latest-pages-articles.xml.bz2", 
"rb").read()))

produces 
  "901179" with CPython 2.5.5
  "1876027" with pypy 1.2.0

first 24576 byte of output seem to be identical, then follow 24 byte of garbage 
in the pypy output where the original has 17808 bytes of text, the some common 
text ...

test machine is running debian sid with libbz2 1.0.5-4 on amd64 platform

----------
effort: ???
files: rec00001enwiki-latest-pages-articles.xml.bz2
messages: 1763
nosy: Aw7WsX1tvC, pypy-issue
priority: bug
release: 1.2
status: unread
title: bz2.decompress returns partially garbled content

_______________________________________________________
PyPy development tracker <pypy-dev-issue at codespeak.net>
<https://codespeak.net/issue/pypy-dev/issue541>
_______________________________________________________
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rec00001enwiki-latest-pages-articles.xml.bz2
Type: application/x-bzip
Size: 237985 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/pypy-issue/attachments/20100526/5b2ae2ce/attachment.bin>


More information about the Pypy-issue mailing list