[TriZPUG] A beginners question I am sure

Ken M ken at mack-z.com
Sun Apr 3 21:12:08 CEST 2011


It seems the snippet I based my prototype on and the file I am now using
are of different sources.  Not sure how I let that happen.  The snippet
was more reliable with the <pos> tags containing the word type.  In the
full sample it is in <tt> tags and that is not reliable as that tab is
used for other things.

OK in addition to the suggested changes let me go back to the drawing
board on what source I am using and how to parse the data I desire.
Thank you for the assistance everyone.

Ken

On Sat, 2011-04-02 at 18:56 -0400, Tim Arnold wrote:
> I may be missing something, but your code looks okay to me. I looked
> for the source text and found a couple of examples that made me wonder
> though--in one the lines were broken and in the other paragraphs were
> set in a single line, but still not all the <hw> parts had <pos>
> parts.
> 
> 
> Can you post a direct link to the file? Or maybe someone else will see
> a problem in the code.  I would have used lists and not kept the files
> open, but that's just a matter of taste--your logic looks right to me.
> --Tim Arnold
> 
> 
> On Sat, Apr 2, 2011 at 6:37 PM, Ken M <ken at mack-z.com> wrote:
>         OK so working on a project (just started) that is for my own
>         academic
>         purposes for now.  Just trying to train myself in python.  The
>         attached
>         code snippet is to convert the websters dictionary (grab
>         specific
>         components) and insert them in a pipe delimmited data file for
>         now.
>         Next will be to a database.
>         
>         When I ran this for a 4.5 MB snippet of the file it worked
>         fine, output
>         file generated output good.  However when I run it for the
>         entirety of
>         the webster.txt file (45 MB) the program runs (well more
>         apropos ends
>         without any error message to me) but the output file I am
>         creating is
>         empty (0 bytes).
>         
>         The purpose for now is to build a subset dictionary file that
>         is nothing
>         more than word and the single letter initialism for word type
>         (n = noun,
>         v = verb, etc.)  Would appreciate insight into why this is not
>         running
>         to completion.  If anyone cares to know, I am running this on
>         a Fedora
>         14 box, I edit and created my .py file with vim and my python
>         installation is python-2.7-8.fc14.1.i686 (output from rpm -q
>         python).
>         
>         Thanks in advance,
>         Ken
>         
>         P.S.  To save bandwidth I did not attach the webster.txt file
>         however if
>         anyone wanted to run this against it themselves it can be
>         acquired
>         through the Project Gutenberg site.
>         
>         _______________________________________________
>         TriZPUG mailing list
>         TriZPUG at python.org
>         http://mail.python.org/mailman/listinfo/trizpug
>         http://trizpug.org is the Triangle Zope and Python Users Group
> 
> 




More information about the TriZPUG mailing list