[Tutor] Extract strings from a text file

Emad Nawfal (عماد نوفل) emadnawfal at gmail.com
Fri Feb 27 16:08:07 CET 2009


On Fri, Feb 27, 2009 at 10:01 AM, Paul McGuire <ptmcg at austin.rr.com> wrote:

> Emad wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> Since I'm learning Pyparsing, this was a nice excercise. I've written this
> elementary script which does the job well in light of the data we have
>
> from pyparsing import *
> ID_TAG = Literal("<ID>")
> FULL_NAME_TAG1 = Literal("<Full")
> FULL_NAME_TAG2 = Literal("name>")
> END_TAG = Literal("</")
> word = Word(alphas)
> pattern1 = ID_TAG + word + END_TAG
> pattern2 = FULL_NAME_TAG1 + FULL_NAME_TAG2 + OneOrMore(word) + END_TAG
> result = pattern1 | pattern2
>
> lines = open("lines.txt")# This is your file name
> for line in lines:
>    myresult = result.searchString(line)
>    if myresult:
>        print myresult[0]
>
>
> # This prints out
> ['<ID>', 'Joseph', '</']
> ['<Full', 'name>', 'Joseph', 'Smith', '</']
>
> # You can access the individual elements of the lists to pick whatever you
> want
>
> Emad -
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>
> Welcome to the world of pyparsing!  Your program is a very good first cut
> at
> this problem.  Let me add some suggestions (more like hints toward more
> advanced concepts in your pyparsing learning):
> - Look into Group, as in Group(OneOrMore(word)), this will add organization
> and structure to the returned results.
> - Results names will make it easier to access the separate parsed fields.
> - Check out the makeHTMLTags and makeXMLTags helper methods - these do more
> than just wrap angle brackets around a tag name, but also handle attributes
> in varying order, case variability, and (of course) varying whitespace -
> the
> OP didn't explicitly say this XML data, but the sample does look
> suspicious.
>
> If you only easy_install'ed pyparsing or used the binary windows installer,
> please go back to SourceForge and download the source .ZIP or tarball
> package - these have the full examples and htmldoc directories that the
> auto-installers omit.
>
> Good luck in your continued studies!
> -- Paul
>
>
Thanks Paul. I've read lots ABOUT pyparsing, but doing is different.
Programming is mostly fun just for fun for me. I'm a linguist surrounded by
many programmers.
I enjoy Python and Pyparsing a lot.

-- 
لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد
الغزالي
"No victim has ever been more repressed and alienated than the truth"

Emad Soliman Nawfal
Indiana University, Bloomington

--------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090227/75b85144/attachment.htm>


More information about the Tutor mailing list