Simple Text Processing Help

Paul Hankin paul.hankin at gmail.com
Mon Oct 15 18:32:45 EDT 2007


On Oct 15, 10:08 pm, patrick.wa... at gmail.com wrote:
> Because of my limited Python knowledge, I will need to try to figure
> out exactly how they work for future text manipulation and for my own
> knowledge.  Could you recommend some resources for this kind of text
> manipulation?  Also, I conceptually get it, but would you mind walking
> me through
>
> > for tok in tokens:
> >         if NR_RE.match(tok) and len(chem) >= 4:
> >             chem[2:-1] = [' '.join(chem[2:-1])]
> >             yield chem
> >             chem = []
> >         chem.append(tok)

Sure: 'chem' is a list of all the data associated with one chemical.
When a token (tok) arrives that is matched by NR_RE (ie 3 lots of
digits separated by dots), it's assumed that this is the start of a
new chemical if we've already got 4 pieces of data. Then, we join the
name back up (as was explained in earlier posts), and 'yield chem'
yields up the chemical so far; and a new chemical is started (by
emptying the list). Whatever tok is, it's added to the end of the
current chemical data. Add some print statements in to watch it work
if you can't get it.

This code uses exactly the same algorithm as Marc's code - it's just a
bit clearer (or at least, I thought so). Oh, and it returns a list
rather than a tuple, but that makes no difference.

--
Paul Hankin




More information about the Python-list mailing list