[Tutor] parse text file

Colin Talbert talbertc at usgs.gov
Thu Jun 3 16:45:52 CEST 2010


Hello Steven,
        Thanks for the reply.  Also this is my first post to tutor at python 
so I'll reply all in the future.


However, a file of that size changes things drastically. You can't 
expect to necessarily be able to read the entire 9.2 gigabyte BZ2 file 
into memory at once, let along the unpacked 131 GB text file, EVEN if 
your computer has more than 9.2 GB of memory. So your tests need to 
take this into account.

I thought when you did a for uline in input_file each single line would go 
into memory independently, not the entire file.



I'm pretty sure that this is not your code, because you can't call len() 
on a bz2 file. If you try, you get an error:

You are so correct.  I'd been trying numerous things to read in this file 
and had deleted the code that I meant to put here and so wrote this from 
memory incorrectly.  The code that I wrote should have been:

import bz2
input_file = bz2.BZ2File(r'C:\temp\planet-latest.osm.bz2','rb')
str=input_file.read()
len(str)

Which indeed does return only 900000.

Which is also the number returned when you sum the length of all the lines 
returned in a for line in file with:


import bz2
input_file = bz2.BZ2File(r'C:\temp\planet-latest.osm.bz2','rb')
lengthz = 0
for uline in input_file:
    lengthz = lengthz + len(uline)

print lengthz


Thanks again for you help and sorry for the bad code in the previous 
submittal.


Colin Talbert
GIS Specialist
US Geological Survey - Fort Collins Science Center
2150 Centre Ave. Bldg. C
Fort Collins, CO 80526

(970) 226-9425
talbertc at usgs.gov




From:
Steven D'Aprano <steve at pearwood.info>
To:
tutor at python.org
Date:
06/02/2010 03:42 PM
Subject:
Re: [Tutor] parse text file
Sent by:
tutor-bounces+talbertc=usgs.gov at python.org



Hi Colin,

I'm taking the liberty of replying to your message back to the list, as 
others hopefully may be able to make constructive comments. When 
replying, please ensure that you reply to the tutor mailing list rather 
than then individual.


On Thu, 3 Jun 2010 12:20:10 am Colin Talbert wrote:

> > Without seeing your text file, and the code you use to read the text
> > file, there's no way of telling what is going on, but I can guess
> > the most likely causes:
>
> Since the file is 9.2 gig it wouldn't make sense to send it to you. 

And I am very glad you didn't try *smiles*

However, a file of that size changes things drastically. You can't 
expect to necessarily be able to read the entire 9.2 gigabyte BZ2 file 
into memory at once, let along the unpacked 131 GB text file, EVEN if 
your computer has more than 9.2 GB of memory. So your tests need to 
take this into account.

> > (2) There's a bug in your code so that you stop reading after
> > 900,000 bytes.
>         The code is simple enough that I'm pretty sure there is not a
> bug in it.
>
>         import bz2
>         input_file =
> bz2.BZ2File(r'C:\temp\planet-latest.osm.bz2','rb') print
> len(input_file)
>
> returns 900000

I'm pretty sure that this is not your code, because you can't call len() 
on a bz2 file. If you try, you get an error:


>>> x = bz2.BZ2File('test.bz2', 'w')  # create a temporary file
>>> x.write("some data")
>>> x.close()
>>> input_file = bz2.BZ2File('test.bz2', 'r')  # open it
>>> print len(input_file)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'bz2.BZ2File' has no len()


So whatever your code actually is, I'm fairly sure it isn't what you say 
here.



-- 
Steven D'Aprano
_______________________________________________
Tutor maillist  -  Tutor at python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100603/cbb94aa5/attachment.html>


More information about the Tutor mailing list