Reading by positions plain text files

MRAB python at mrabarnett.plus.com
Tue Nov 30 21:20:57 EST 2010


On 01/12/2010 02:03, javivd wrote:
> On Nov 30, 11:43 pm, Tim Harig<user... at ilthio.net>  wrote:
>> On 2010-11-30, javivd<javiervan... at gmail.com>  wrote:
>>
>>> I have a case now in wich another file has been provided (besides the
>>> database) that tells me in wich column of the file is every variable,
>>> because there isn't any blank or tab character that separates the
>>> variables, they are stick together. This second file specify the
>>> variable name and his position:
>>
>>> VARIABLE NAME      POSITION (COLUMN) IN FILE
>>> var_name_1                 123-123
>>> var_name_2                 124-125
>>> var_name_3                 126-126
>>> ..
>>> ..
>>> var_name_N                 512-513 (last positions)
>>
>> I am unclear on the format of these positions.  They do not look like
>> what I would expect from absolute references in the data.  For instance,
>> 123-123 may only contain one byte??? which could change for different
>> encodings and how you mark line endings.  Frankly, the use of the
>> world columns in the header suggests that the data *is* separated by
>> line endings rather then absolute position and the position refers to
>> the line number. In which case, you can use splitlines() to break up
>> the data and then address the proper line by index.  Nevertheless,
>> you can use file.seek() to move to an absolute offset in the file,
>> if that really is what you are looking for.
>
> I work in a survey research firm. the data im talking about has a lot
> of 0-1 variables, meaning yes or no of a lot of questions. so only one
> position of a character is needed (not byte), explaining the 123-123
> kind of positions of a lot of variables.
>
> and no, MRAB, it's not the similar problem (at least what i understood
> of it). I have to associate the position this file give me with the
> variable name this file give me for those positions.
>
> thank you both and sorry for my english!
>
You just have to parse the second file to build a list (or dict)
containing the name, start position and end position of each variable:

     variables = [("var_name_1", 123, 123), ...]

and then work through that list, extracting the data between those
positions in the first file and putting the values in another list (or
dict).

You also need to check whether the positions are 1-based or 0-based
(Python uses 0-based).



More information about the Python-list mailing list