Browsing text ; Python the right tool?

John Machin sjmachin at lexicon.net
Tue Jan 25 16:53:21 EST 2005


Paul Kooistra wrote:
> I need a tool to browse text files with a size of 10-20 Mb. These
> files have a fixed record length of 800 bytes (CR/LF), and containt
> records used to create printed pages by an external company.
>
> Each line (record) contains an 2-character identifier, like 'A0' or
> 'C1'. The identifier identifies the record format for the line,
> thereby allowing different record formats to be used in a textfile.
> For example:
>
> An A0 record may consist of:
> recordnumber [1:4]
> name         [5:25]
> filler       [26:800]

1. Python syntax calls these [0:4], [4:25], etc. One has to get into
the habit of deducting 1 from the start column position given in a
document.

2. So where's the "A0"? Are the records really 804 bytes wide -- "A0"
plus the above plus CR LF? What is "recordnumber" -- can't be a line
number (4 digits -> max 10k; 10k * 800 -> only 8Mb); looks too small to
be a customer identifier; is it the key to a mapping that produces
"A0", "C1", etc?

>
> while a C1 record consists of:
> recordnumber [1:4]
> phonenumber  [5:15]
> zipcode      [16:20]
> filler       [21:800]
>
> As you see, all records have a fixed column format. I would like to
> build a utility which allows me (in a windows environment) to open a
> textfile and browse through the records (ideally with a search
> option), where each recordtype is displayed according to its
> recordformat ('Attributename: Value' format). This would mean that
> browsing from a A0 to C1 record results in a different list of
> attributes + values on the screen, allowing me to analyze the data
> generated a lot easier then I do now, browsing in a text editor with
a
> stack of printed record formats at hand.
>
> This is of course quite a common way of encoding data in textfiles.
> I've tried to find a generic text-based browser which allows me to do
> just this, but cannot find anything. Enter Python; I know the
language
> by name, I know it handles text just fine, but I am not really
> interested in learning Python just now, I just need a tool to do what
> I want.
>
> What I would REALLY like is way to define standard record formats in
a
> separate definition, like:
> - defining a common record length;
> - defining the different record formats (attributes, position of the
> line);

Add in the type, number of decimal places, etc as well ..

> - and defining when a specific record format is to be used, dependent
> on 1 or more identifiers in the record.
>
> I CAN probably build something from scratch, but if I can (re)use
> something that already exists it would be so much better and
faster...
> And a utility to do what I just described would be REALLY usefull in
> LOTS of environments.
>
> This means I have the following questions:
>
> 1. Does anybody now of a generic tool (not necessarily Python based)
> that does the job I've outlined?

No, but please post if you hear of one.

> 2. If not, is there some framework or widget in Python I can adapt to
> do what I want?
> 3. If not, should I consider building all this just from scratch in
> Python - which would probably mean not only learning Python, but some
> other GUI related modules?

Approach I use is along the lines of what you suggested, but w/o the
GUI.
I have a Python script that takes layout info and an input file and can
produce an output file in one of two formats:

Format 1:
something like:
Rec:A0 recordnumber:0001 phonenumber:(123) 555-1234 zipcode:12345

This is usually much shorter than the fixed length record, because you
leave out the fillers (after checking they are blank!), and strip
trailing spaces from alphanumeric fields. Whether you leave integers,
money, date etc fields as per file or translated into human-readable
form depends on who will be reading it.

You then use a robust text editor (preferably one which supports
regular expressions in its find function) to browse the output file.

Format 2:
Rec:A0
recordnumber:0001
etc etc i.e. one field per line? Why, you ask? If you are a consumer of
such files, so that you can take small chunks of this, drop it into
Excel, testers take copy, make lots of juicy test data, run it through
another script which makes a flat file out of it.

> 4. Or should I forget about Python and build someting in another
> environment?

No way!




More information about the Python-list mailing list