[Tutor] parse text file

Steven D'Aprano steve at pearwood.info
Wed Jun 2 23:41:41 CEST 2010


Hi Colin,

I'm taking the liberty of replying to your message back to the list, as 
others hopefully may be able to make constructive comments. When 
replying, please ensure that you reply to the tutor mailing list rather 
than then individual.


On Thu, 3 Jun 2010 12:20:10 am Colin Talbert wrote:

> > Without seeing your text file, and the code you use to read the text
> > file, there's no way of telling what is going on, but I can guess
> > the most likely causes:
>
> Since the file is 9.2 gig it wouldn't make sense to send it to you. 

And I am very glad you didn't try *smiles*

However, a file of that size changes things drastically. You can't 
expect to necessarily be able to read the entire 9.2 gigabyte BZ2 file 
into memory at once, let along the unpacked 131 GB text file, EVEN if 
your computer has more than 9.2 GB of memory. So your tests need to 
take this into account.

> > (2) There's a bug in your code so that you stop reading after
> > 900,000 bytes.
>         The code is simple enough that I'm pretty sure there is not a
> bug in it.
>
>         import bz2
>         input_file =
> bz2.BZ2File(r'C:\temp\planet-latest.osm.bz2','rb') print
> len(input_file)
>
> returns 900000

I'm pretty sure that this is not your code, because you can't call len() 
on a bz2 file. If you try, you get an error:


>>> x = bz2.BZ2File('test.bz2', 'w')  # create a temporary file
>>> x.write("some data")
>>> x.close()
>>> input_file = bz2.BZ2File('test.bz2', 'r')  # open it
>>> print len(input_file)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'bz2.BZ2File' has no len()


So whatever your code actually is, I'm fairly sure it isn't what you say 
here.



-- 
Steven D'Aprano


More information about the Tutor mailing list