Best way to parse file into db-type layout?
Peter A. Schott
paschott at no.yahoo.spamm.com
Tue May 3 15:12:07 EDT 2005
Thanks for the comments. Quick notes:
1. Yes, ultimate goal is to output various fields into Excel, but I need to
enter the data into a DB as well to avoid having any strange problems. That may
be handled by a completely different process. I should have been more exact in
saying that my current goal is to get this data into a database.
2. I'm relatively new to Python and definitely new to trying to parse
Ragged-right type CSV files that are pairs of data. I'm trying to get my syntax
correct when wording my questions/phrases - please bear with me. :-) Your help
(all posters) is quite appreciated.
3. regarding read/write each row - I started thinking about working my way
through the dictionary. This file could have up to 1900 pairs of data per row.
That's an awful lot of columns to put into a database table, especially if quite
a few will be NULL. I starting thinking of stepping through each pair by each
row. The rows are identified by Letter type and an account number (always the
same field #). I would want to use those, and insert something like:
LetterType, Account #, Field#, Field Data, (current date)
I wasn't quite sure how to phrase that and apologize for being unclear. I
guess if I'm doing row-by-row, I can hold that for each row without too much
trouble.
4. No control over file format. I can define a lookup table that will match
Field# to a real name, perhaps code that into my scripts, but I'm stuck with
that. As to other posters' responses - it's ANSI-coded, CSV, No Quotes (no
Commas in values). Coming from the mainframe, I don't have to worry about
embedded special characters within the file as they can't be entered into these
fields to start with.
File always starts with letter_type, 1 - no idea why they didn't do that as
0, letter_type for consistency but it isn't my file. (Heck, I would have just
specified to output all fields all the time and avoid any need to do special
parsing :-)
Anyway, I appreciate the help. The code looks like it will work for me and I'll
be doing some basic tests to get up to speed on splitting/parsing the data and
putting it into a form that will be usable by all. I may have to write certain
rows out to a new file if they don't match what I'm looking for. I don't
anticipate any issues there - just read the row, write it to a new file if it
doesn't match my criteria.
Thanks to all. Off to search for dictionaries and databases now. :-)
-Pete
John Machin <sjmachin at lexicon.net> wrote:
> On Fri, 29 Apr 2005 18:54:54 GMT, Peter A. Schott
> <paschott at no.yahoo.spamm.com> wrote:
>
> >That looks promising.
>
> > The field numbers are pre-defined at the mainframe level.
>
> Of course. Have you managed to acquire a copy of the documentation, or
> do you have to reverse-engineer it?
>
> >This may help me get to my ultimate goal which is to pump these into a DB on a
> >row-by-row basis ( :-P )
>
> That's your *ultimate* goal? Are you running a retro-computing museum
> or something? Don't you want to *USE* the data?
>
> > I'll have to do some playing around with this. I
> >knew that it looked like a dictionary, but wasn't sure how best to handle this.
> >
> >One follow-up question: I'll end up getting multiple records for each "type".
>
> What does that mean?? If it means that more than one customer will get
> the "please settle your account" letter, and more than one customer
> will get the "please buy a spangled fritzolator, only $9.99" letter,
> you are stating the obvious -- otherwise, please explain.
>
> >Would I be referencing these by row[#][field#]?
>
> Not too sure what you mean by that -- whether you can get away with a
> (read a row, write a row) way of handling the data depends on its
> structure (like what are the relationships if any between different
> rows) and what you want to do with it -- both murky concepts at the
> moment.
>
> >
> >Minor revision to the format is that starts like:
> >###,1,1,val_1,....
>
> How often do these "minor revisions" happen? How flexible do you have
> to be? And the extra "1" means what? Is it ever any other number?
>
> >
> >
> >I think right now the plan is to parse through the file and insert the pairs
> >directly into a DB table. Something like RowID, LetterType, date, Field#,
> >Value.
>
> Again, I'd recommend you lose the "Field#" in favour of a better
> representation, ASAP.
>
> > I can get RowID and LetterType overall, date is a constant, the rest
> >would involve reading each pair and inserting both values into the table. Time
> >to hit the books a little more to get up to speed on all of this.
>
> What you need is (a) a clear appreciation of what you are trying to do
> with the data at a high level (b) then develop an understanding of
> what is the underlying data model (c) then and only then worry about
> technical details.
>
> Good luck,
> John
>
More information about the Python-list
mailing list