Reading by positions plain text files

javivd javiervandam at gmail.com
Sun Dec 12 10:02:13 EST 2010


On Dec 1, 7:15 am, Tim Harig <user... at ilthio.net> wrote:
> On 2010-12-01, javivd <javiervan... at gmail.com> wrote:
>
>
>
>
>
>
>
>
>
> > On Nov 30, 11:43 pm, Tim Harig <user... at ilthio.net> wrote:
> >> On 2010-11-30, javivd <javiervan... at gmail.com> wrote:
>
> >> > I have a case now in wich anotherfilehas been provided (besides the
> >> > database) that tells me in wich column of thefileis every variable,
> >> > because there isn't any blank or tab character that separates the
> >> > variables, they are stick together. This secondfilespecify the
> >> > variable name and his position:
>
> >> > VARIABLE NAME      POSITION (COLUMN) INFILE
> >> > var_name_1                 123-123
> >> > var_name_2                 124-125
> >> > var_name_3                 126-126
> >> > ..
> >> > ..
> >> > var_name_N                 512-513 (last positions)
>
> >> I am unclear on the format of these positions.  They do not look like
> >> what I would expect from absolute references in the data.  For instance,
> >> 123-123 may only contain one byte??? which could change for different
> >> encodings and how you mark line endings.  Frankly, the use of the
> >> world columns in the header suggests that the data *is* separated by
> >> line endings rather then absolute position and the position refers to
> >> the line number. In which case, you can use splitlines() to break up
> >> the data and then address the proper line by index.  Nevertheless,
> >> you can usefile.seek() to move to an absolute offset in thefile,
> >> if that really is what you are looking for.
>
> > I work in a survey research firm. the data im talking about has a lot
> > of 0-1 variables, meaning yes or no of a lot of questions. so only one
> > position of a character is needed (not byte), explaining the 123-123
> > kind of positions of a lot of variables.
>
> Thenfile.seek() is what you are looking for; but, you need to be aware of
> line endings and encodings as indicated.  Make sure that you open thefile
> using whatever encoding was used when it was generated or you could have
> problems with multibyte characters affecting the offsets.

I've tried your advice and something is wrong. Here is my code,



f = open(r'c:c:\somefile.txt', 'w')

f.write('0123456789\n0123456789\n0123456789')

f.close()

f = open(r'c:\somefile.txt', 'r')


for line in f:
    f.seek(3,0)
    print f.read(1) #just to know if its printing the rigth column

I used .seek() in this manner, but is not working.

Let me put the problem in another way. I have .txt file with NO
headers, and NO blanks between any columns. But i know that from
columns, say 13 to 15, is variable VARNAME_1 (of course, a three digit
var). How can extract that column in a list call VARNAME_1??

Obviously, this should extend to all the positions and variables i
have to extract from the file.

Thanks!

J



More information about the Python-list mailing list