Legacy data parsing

gov Gov at mailinator.com
Fri Jul 8 14:31:14 EDT 2005


Hi,

I've just started to learn programming and was told this was a good
place to ask questions :)

Where I work, we receive large quantities of data which is currently
all printed on large, obsolete, dot matrix printers.  This is a problem
because the replacement parts will not be available for much longer.

So I'm trying to create a program which will capture the fixed width
text file data and convert as well as sort the data (there are several
different report types) into a different format which would allow it to
be printed normally, or viewed on a computer.

I've been reading up on the Regular Expression module and ways in which
to manipulate strings however it has been difficult to think of a way
in which to extract an address.

Here's an example of the raw text that I have to work with:


ADDRESS INFORMATION/RENSEIGNEMENTS SUR L'ADRESSE:
****************************

FOR/POUR AL/LA:  20
  CORR TYP:  A1B 2C3      P:3 CHNGD/CHANG
  LANG: E CONS/REGR:             #######
  MRS XXX X XXXXXXX
  ### XXXXXXXXX ST                      DD   TYP:               P:6
CHNGD/CHANG
  MONCTON NB                            LANG: E CONS/REGR:
#######
                                        MRS XXX X          XXXXXXX
                                        #####
                                        ####
                                        ###-###-#

ADDRESS INFORMATION/RENSEIGNEMENTS SUR L'ADRESSE:
****************************

FOR/POUR AL/LA:  30
  BOTH TYP:  A1B 2D3      P:3 CHNGD/CHANG
  LANG: E CONS/REGR:             #######
  MISS XXXX XXXXX
  ### XXXXXXXX ST
  MONCTON NB

EARNINGS VITAL INFORMATION/RENSEIGNEMENTS ESSENTIELS SUR LES GAINS:
***********

(the # = any number, and the X's are just regular text)
I would like to extract the address information, but the two different
text objects on the right hand side are difficult to remove.  I think
it would be easier if I could just extract a fixed square of
information, but I don't have a clue as to how to go about it.

If anyone could give me suggestions as to methods in sorting this type
of data, it would be appreciated.




More information about the Python-list mailing list