python student at university of jordan.

Fri Oct 24 22:06:40 EDT 2014

On Sat, Oct 25, 2014 at 12:47 PM, heba abukaff
<habukaff50 at yahoo.com.dmarc.invalid> wrote:
> i have a trouble using the tokenizer to find the frequency list for URL using arabic text.and iam using python 2.7.2 on winXP,I tried this code but every time i run the code appears error with first line

I'm seeing two problems here. One of them may not actually be a
problem in your code, but just in how you're posting: your text has
all been rewrapped. Post the exact code, as plain text (not HTML); you
should be able to do this, but if you can't with Yahoo, try a
different email provider. Make sure we can see exactly where your code
begins and ends, so we can understand what "first line" you're looking
at - and if you copy and paste the actual error you get, that would be
extremely helpful, too. (Even if it's in Arabic. There'll be parts we
can understand.)

The second problem is that you're trying to work with non-English text
in Python 2.7. This is harder than it needs to be. Install the latest
Python (3.4) and use that instead of 2.7; the NLTK module is
compatible with 3.2+, so it should work fine. I can't be sure that
you're having trouble with bytes vs strings, because I can't see what
your code's doing (due to the wrap/indent problem), but in any case,
shifting to Python 3 gives you a much better chance of getting things
right. All you'll need to do, I suspect, is change your print
statements into function calls:

# Old style:
print "word with highest count: %s" % (fd.max())
# New style:
print("word with highest count: %s" % (fd.max()))

Easy! And only slightly harder when you send it to a different destination:

# Old style:
print>>outfile, '%s\t%d' % (t, len(t))
# New style:
print('%s\t%d' % (t, len(t)), file=outfile)

With those changes, your code will probably (I can't test it) work on
Python 3.4.

ChrisA