[Tutor] help related to unicode using python
Steven D'Aprano
steve at pearwood.info
Wed Mar 20 19:48:29 CET 2013
On 20/03/13 22:38, nishitha reddy wrote:
> Hi all
> i'm working with unicode using python
> i have some txt files in telugu i want to split all the lines of that
> text files in to words of telugu
> and i need to classify all of them using some identifiers.can any one
> send solution for that
Probably not. I would be surprised if anyone here knows what Telugu is,
or the rules for splitting Telugu text into words. The Natural Language
Toolkit (NLTK) may be able to handle it.
You could try doing the splitting and classifying yourself. If Telugu uses
space-delimited words like English, you can do it easily:
data = u"ఏఐఒ ఓఔక ఞతణథ"
words = data.split()
As for classifying the words, I have no idea, sorry.
--
Steven
More information about the Tutor
mailing list