[Tutor] Tokenizing Help
bob gailer
bgailer at gmail.com
Wed Apr 22 23:16:56 CEST 2009
William Witteman wrote:
> I need to be able to decompose a formatted text file into identifiable,
> possibly named pieces. To tokenize it, in other words. There seem to
> be a vast array of modules to do this with (simpleparse, pyparsing etc)
> but I cannot understand their documentation.
>
> The file format I am looking at (it is a bibliographic reference file)
> looks like this:
>
> <1> # the references are enumerated
> AU - some text
> perhaps across lines
> AB - some other text
> AB - there may be multiples of some fields
> UN - any 2-letter combination may exist, other than by exhaustion, I
> cannot anticipate what will be found
>
> What I am looking for is some help to get started, either with
> explaining the implementation of one of the modules with respect to my
> format, or with an approach that I could use from the base library.
>
What is your ultimate goal?
--
Bob Gailer
Chapel Hill NC
919-636-4239
More information about the Tutor
mailing list