Legacy data parsing

Miki Tebeka miki.tebeka at zoran.com
Fri Jul 8 15:04:57 EDT 2005


Hello gov,

> Here's an example of the raw text that I have to work with:
> 
> 
> ADDRESS INFORMATION/RENSEIGNEMENTS SUR L'ADRESSE:
> ****************************
> 
> FOR/POUR AL/LA:  20
>   CORR TYP:  A1B 2C3      P:3 CHNGD/CHANG
>   LANG: E CONS/REGR:             #######
>   MRS XXX X XXXXXXX
>   ### XXXXXXXXX ST                      DD   TYP:               P:6
> CHNGD/CHANG
>   MONCTON NB                            LANG: E CONS/REGR:
> #######
>                                         MRS XXX X          XXXXXXX
>                                         #####
>                                         ####
>                                         ###-###-#
> 
> ADDRESS INFORMATION/RENSEIGNEMENTS SUR L'ADRESSE:
> ****************************
> 
> FOR/POUR AL/LA:  30
>   BOTH TYP:  A1B 2D3      P:3 CHNGD/CHANG
>   LANG: E CONS/REGR:             #######
>   MISS XXXX XXXXX
>   ### XXXXXXXX ST
>   MONCTON NB
> 
> EARNINGS VITAL INFORMATION/RENSEIGNEMENTS ESSENTIELS SUR LES GAINS:
> ***********
> 
> (the # = any number, and the X's are just regular text)
> I would like to extract the address information, but the two different
> text objects on the right hand side are difficult to remove.  I think
> it would be easier if I could just extract a fixed square of
> information, but I don't have a clue as to how to go about it.
> 
> If anyone could give me suggestions as to methods in sorting this type
> of data, it would be appreciated.
Maybe regular expression are too difficult for this. I'd try one of the
parsing toolkits (such as PLY, PyParsing ...), it might be more suitable
for the job.

HTH.
--
------------------------------------------------------------------------
Miki Tebeka <miki.tebeka at zoran.com>
http://tebeka.bizhat.com
The only difference between children and adults is the price of the toys
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 193 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20050708/0eaf4e20/attachment.sig>


More information about the Python-list mailing list