[Tutor] fixed or variable length fields?

Michael Janssen Janssen@rz.uni-frankfurt.de
Sat Mar 1 07:58:01 2003


On Sat, 1 Mar 2003, Paul Tremblay wrote:

> ob<nu<nu<nu<0001<{
> cw<nu<nu<nu<rtf>true<rtf
> cw<nu<nu<nu<macintosh>true<macintosh
> cw<nu<nu<nu<font-table>true<font-table

I had taken a glimpse into an rtf-document and it looks different.
font-table is for example in such a line:
{\rtf1\ansi\ansicpg1252\deff0{\fonttbl  [continues spaceless]
{\f0\fswiss\fprq2\fcharset0 Arial;}{\f1\fnil\fcharset2 Symbol;}}

are your "lines of tokens" data from an intermediate step (or is rtf that
anti standardized)? Represent it an atomic rewrite of the information in
hairy lines like above?

now my question is (not affiliated with the subject of this thread, by
the way :-):

In case it is intermediate data, why is it of type string? In case you
have processed the information earlier (possibly rewrite it into an
"normalized" format), you might want to save this results to disk - but
you needn't to restart from disk splitting each line into
computer-understandable datastructures. Just process with the
datastructures from your former step.

Or is it neccessary to save memory? Or did i miss anything else?

Michael

>
> (Fields delimited with "<" and ">" because all "<" and ">" have
> been converted to "&lt;" and "&gt;"
>
> I will make several passes through this file to convert the data.
>
> Each time I read a line, I will use the string method, and sometimes the
> split method:
>
> if line[12:23] == 'font-table':
> 	info = [12:23]
> 	list = info.split(">")
> 	if list[1] == 'true':
> 		# do something
>
> If I use fixed length fields, then I won't have to do any splitting. I
> also know that in perl, there is a way to use 'pack' and 'unpack' to
> quickly access fixed fields. I have never used this, and don't know if
> the pack in Python is similar.
>
> If fix fields did give me a speed increase, I would certainly suffer
> from readibility. For example, the above 4 lines of tokens might look
> like:
>
> opbr:null:null:null:0001
> ctrw:null:null:true:rtfx
> ctrw:null:null:true:mact
> ctrw:null:null:true:fntb
>
> Instead of 'macintosh', I have 'mact'; instead of 'font-table', I have
> 'fntb'.
>
> Thanks
>
> Paul
>
> --
>
> ************************
> *Paul Tremblay         *
> *phthenry@earthlink.net*
> ************************
>
> _______________________________________________
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
>