[Tutor] Gurus of one liner- MOVe (blah) from one token to another and REnumber tokens

Alan ldapguru at yahoo.com
Thu Nov 24 10:55:00 CET 2005


Dear Gurus of python one liner innovator

I have about 150 lines of python extracting text from large file, the
problem I need a few lines to clean first to avoid the problem the
script is facing

Overview
There is large text and I am trying to organize it for the python script
to process, it is badly organized and I attempted to do it like this
which the master script understand

Keywords:
##### is number like 1 thru 99999
|H paragraphs
|F reFerence
|R Rating

BEFORE I organized by text global and replace
Each set of tokens was like this

#####  paragraph
F reference
R rating

Now (where master script understand)

|H###### paragraph
|F reference
|R rating

Notice no ##### in |F |R

PROBLEMS
Phase 1
PROBLEM 1
the |H paragraph (multi lines) has some words between () such as (xyz
blah words) also maybe in multi lines
….( blah blah
blah blah) …

We need to move it to the end of |F reference (xyz blah words)


Example
BEFORE

|H 00100 a friend in need is a friend indeed (author means both young \
and old) so select the best friend as soon as you can blah
|F Old London book
|R Cool

AFTER your process 
|H 00100 "a friend in need is a friend indeed so select the best friend
as soon as you can blah"
|F Old London book
|R Cool

PROBLEM 2
I need to find out if the order is broken so I go and fix it by hand
i.e. |H##### |F |R is any other order so it is outputted in
ErrorOrderLogFile

|H##### paragraph
|H paragraph
|R rating

or any order like

run new cleaning script and cat ErrorOrderLogFile 
|H00299 paragraph
|F Reference
|H Rating

|H00300 paragraph
|H paragraph
|H rating

cat ErrorOrderLogFile:
bad set orders
|H00300 paragraph


Phase II
PROBLEM 3
Once I fix by the order hand I need to renumber all from say 00001 to
99999
In this format

|H00001 paragraph
|F00001 reference
|R00001 rating

|H99999 paragraph
|F99999 reference
|R99999 rating


 

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.778 / Virus Database: 525 - Release Date: 10/15/2004
 



More information about the Tutor mailing list