Browsing text ; Python the right tool?
Jeff Shannon
jeff at ccvcorp.com
Wed Jan 26 19:59:01 EST 2005
John Machin wrote:
> Jeff Shannon wrote:
>
>>[...] For ~10 or fewer types whose spec
>>doesn't change, hand-coding the conversion would probably be quicker
>>and/or more straightforward than writing a spec-parser as you
>>suggest.
>
> I didn't suggest writing a "spec-parser". No (mechanical) parsing is
> involved. The specs that I'm used to dealing with set out the record
> layouts in a tabular fashion. The only hassle is extracting that from a
> MSWord document or a PDF.
The "specs" I'm used to dealing with are inconsistent enough that it's
more work to "massage" them into strict tabular format than it is to
retype and verify them. Typically it's one or two file types, with
one or two record types each, from each vendor -- and of course no
vendor uses anything similar to any other, nor is there a standardized
way for them to specify what they *do* use. Everything is almost
completely ad-hoc.
>>If, on the other hand, there are many record types, and/or those
>>record types are subject to changes in specification, then yes, it'd
>>be better to parse the specs from some sort of data file.
>
> "Parse"? No parsing, and not much code at all: The routine to "load"
> (not "parse") the layout from the layout.csv file into dicts of dicts
> is only 35 lines of Python code. The routine to take an input line and
> serve up an object instance is about the same. It does more than the
> OP's browsing requirement already. The routine to take an object and
> serve up a correctly formatted output line is only 50 lines of which
> 1/4 is comment or blank.
There's a tradeoff between the effort involved in writing multiple
custom record-type classes, and the effort necessary to write the
generic loading routines plus the effort to massage coerce the
specifications into a regular, machine-readable format. I suppose
that "parsing" may not precisely be the correct term here, but I was
using it in parallel to, say, ConfigParser and Optparse. Either
you're writing code to translate some sort of received specification
into a usable format, or you're manually pushing bytes around to get
them into a format that your code *can* translate. I'd say that my
creation of custom classes is just a bit further along a continuum
than your massaging of specification data -- I'm just massaging it
into Python code instead of CSV tables.
>>I suspect
>>that we're both assuming a case similar to our own personal
>>experiences, which are different enough to lead to different
>>preferred solutions. ;)
>
> Indeed. You seem to have lead a charmed life; may the wizards and the
> rangers ever continue to protect you from the dark riders! :-)
Hardly charmed -- more that there's so little regularity in what I'm
given that massaging it to a standard format is almost as much work as
just buckling down and retyping it. My one saving grace is that I'm
usually able to work with delimited files, rather than
column-width-specified files. I'll spare you the rant about my many
job-related frustrations, but trust me, there ain't no picnics here!
Jeff Shannon
Technician/Programmer
Credit International
More information about the Python-list
mailing list