splitting tables

Bengt Richter bokr at oz.net
Sun Feb 8 11:40:22 EST 2004


On Sun, 8 Feb 2004 14:41:55 +0000 (UTC), robsom <no.mail at no.mail.it> wrote:

>Il Sat, 07 Feb 2004 22:10:17 +0100, Karl Pflästerer ha scritto:
>
>> What do you want to be done?  To see if an item is missing is trivial:
>> just check the length of the splitted line (a list).  But what the right
>> action in that case is belongs to you; should the user be asked? is
>> always the same column missing? is it possible to distinguish the
>> entries without errors from each other so the programm can decide which
>> column is missing?
>
>Ok, I'll try to give some more detail. These are files with data from
>field measurements and contain information about location, time,
>measurement, measurement flag, error, detection limit, calibration and
>other stuff like that. The problem is that they are made by different
>groups and are not always consistent in their format and I'm trying to
>write a code which is as general as possible.
>When a table has fixed-width columns and each element is correctly aligned
>as in the example I showed you, the problem is solved by slicing the
>string as Skip Montanaro suggested in his answer (thanks!), but this is
>not always the case. For example I can have something like this:
>
>47.455677 456.67
>47.4558 453.8
>47.46789 -9999
>47.4567 456
>
>where -9999 (or somethinbg similar) indicates there is a blank, one space
>divides the columns and the elements can have a different number of
>digits. This is of course a worst-case scenario :)
>That is why I used split in the beginning, but then I fall into the other
>problem, when there is a missing value.
>Any suggestions will be much appreciated, thanks
>
Maybe a modified regex that takes into account particular field formats?
A regex will search for things in order, so you can set one up to match
special things like -9999 while still allowing -9999.9 etc.
What do you know about each field and the separations? Is there always a full
set of fields, even if some are blank?
 
(BTW, my other post misleads in implying that line.rstrip('\n') is necessary
to get the regex to match).

Regards,
Bengt Richter



More information about the Python-list mailing list