[Tutor] help related to unicode using python

Steven D'Aprano steve at pearwood.info
Wed Mar 20 19:48:29 CET 2013


On 20/03/13 22:38, nishitha reddy wrote:
> Hi all
> i'm working with unicode using python
> i have some txt files in telugu i want to split all the lines of that
> text files in to words of telugu
> and i need to classify  all of them using some identifiers.can any one
> send solution for that


Probably not. I would be surprised if anyone here knows what Telugu is,
or the rules for splitting Telugu text into words. The Natural Language
Toolkit (NLTK) may be able to handle it.

You could try doing the splitting and classifying yourself. If Telugu uses
space-delimited words like English, you can do it easily:

data = u"ఏఐఒ ఓఔక ఞతణథ"
words = data.split()

As for classifying the words, I have no idea, sorry.


-- 
Steven


More information about the Tutor mailing list