Is there no compression support for large sized strings in Python?

Gerald Klix Gerald.Klix at klix.ch
Thu Dec 1 10:05:51 EST 2005


Did you consider the mmap library?
Perhaps it is possible to avoid to hold these big stings in memory.
BTW: AFAIK it is not possible in 32bit windows for an ordinary programm 
to allocate more than 2 GB. That restriction comes from the jurrasic 
MIPS-Processors, that reserved the upper 2 GB for the OS.

HTH,
Gerald

Claudio Grondi schrieb:
> "Fredrik Lundh" <fredrik at pythonware.com> schrieb im Newsbeitrag
> news:mailman.1444.1133442090.18701.python-list at python.org...
> 
>>Claudio Grondi wrote:
>>
>>
>>>What started as a simple test if it is better to load uncompressed data
>>>directly from the harddisk or
>>>load compressed data and uncompress it (Windows XP SP 2, Pentium4  3.0
> 
> GHz
> 
>>>system with 3 GByte RAM)
>>>seems to show that none of the in Python available compression libraries
>>>really works for large sized
>>>(i.e. 500 MByte) strings.
>>>
>>>Test the provided code and see yourself.
>>>
>>>At least on my system:
>>> zlib fails to decompress raising a memory error
>>> pylzma fails to decompress running endlessly consuming 99% of CPU time
>>> bz2 fails to compress running endlessly consuming 99% of CPU time
>>>
>>>The same works with a 10 MByte string without any problem.
>>>
>>>So what? Is there no compression support for large sized strings in
> 
> Python?
> 
>>you're probably measuring windows' memory managment rather than the com-
>>pression libraries themselves (Python delegates all memory allocations
>>256 bytes
>>to the system).
>>
>>I suggest using incremental (streaming) processing instead; from what I
> 
> can tell,
> 
>>all three libraries support that.
>>
>></F>
> 
> 
> Have solved the problem with bz2 compression the way Frederic suggested:
> 
> fObj = file(r'd:\strSize500MBCompressed.bz2', 'wb')
> import bz2
> objBZ2Compressor = bz2.BZ2Compressor()
> lstCompressBz2 = []
> for indx in range(0, len(strSize500MB), 1048576):
>   lowerIndx = indx
>   upperIndx = indx+1048576
>   if(upperIndx > len(strSize500MB)): upperIndx = len(strSize500MB)
> 
> lstCompressBz2.append(objBZ2Compressor.compress(strSize500MB[lowerIndx:upper
> Indx]))
> #:for
> lstCompressBz2.append(objBZ2Compressor.flush())
> strSize500MBCompressed = ''.join(lstCompressBz2)
> fObj.write(strSize500MBCompressed)
> fObj.close()
> 
> :-)
> 
> so I suppose, that the decompression problems can also be solved that way,
> but  :
> 
> This still doesn't for me answer the question what the core of the problem
> was, how to avoid it and what are the memory request limits which should be
> considered when working with large strings?
> Is it actually so, that on other systems than Windows 2000/XP there is no
> problem with the original code I have provided?
> Maybe a good reason to go for Linux instead of Windows? Does e.g. Suse or
> Mandriva Linux have also a memory limit a single Python process can use?
> Please let me know about your experience.
> 
> Claudio
> 
> 



More information about the Python-list mailing list