Does pyparsing support UNICODE strings?

Thu Aug 4 05:50:12 EDT 2005

I tested his code in a file test2.py:

# -*- coding: UTF-8 -*-

from pyparsing import Word

text = "Καλημέρα, κόσμε!".decode('utf-8')
alphas = u''.join(unichr(x) for x in xrange(0x386, 0x3ce))

greet = Word(alphas) + u',' + Word(alphas) + u'!'
greeting = greet.parseString(text)
print greeting

After run this, I got the following result:
[u'\u039a\u03b1\u03bb\u03b7\u03bc\u03ad\u03c1\u03b1', u',',
u'\u03ba\u03cc\u03c3\u03bc\u03b5', u'!']

I use windows xp sp2 simple Chinese,
python 2.41,my code is as below:
# -*- coding: UTF-8 -*-

from pyparsing import CharsNotIn

text = u"简体中文测试, 繁體中文測試!"
greet = CharsNotIn(u',!') + u',' + CharsNotIn(u',!') + u'!'
greeting = greet.parseString(text)
for x in greeting:
    print x.encode("cp936") #or x.encode("gbk")

And the result is as below:
简体中文测试
,
 繁體中文測試
!

Everything works just correctly.

On 8/4/05, saddle <saddle at gmail.com> wrote:
> the code what posted by Rober Kern
> 
> from pyparsing import Word
> text = "Καλημ?ρα, κ?σμε!".decode('utf-8')
> alphas = u''.join(unichr(x) for x in xrange(0x386, 0x3ce))
> greet = Word(alphas) + u',' + Word(alphas) + u'!'
> greeting = greet.parseString(text)
> print greeting
> 
> 
> my system default is cp936, Simp Chinese.
> 
> On Thu, 4 Aug 2005 17:33:16 +0800
> could ildg <could.net at gmail.com> ׫д��:
> 
> could.net> So what's you code?
> could.net> and what's you system default encoding?
> could.net>
> could.net> On 8/4/05, saddle <saddle at gmail.com> wrote:
> could.net> > hello, but i can't run the script. could u told me what's the trick pls?
> could.net> > here is the error output.
> could.net> >
> could.net> > D:\python\test>pyp
> could.net> > sys:1: DeprecationWarning: Non-ASCII character '\xce' in file D:\python\test\py
> could.net> > .py on line 3, but no encoding declared; see http://www.python.org/peps/pep-026
> could.net> > .html for details
> could.net> > Traceback (most recent call last):
> could.net> >   File "D:\python\test\pyp.py", line 9, in ?
> could.net> >     greeting = greet.parseString(text)
> could.net> >   File "C:\Python24\Lib\site-packages\pyparsing.py", line 616, in parseString
> could.net> >     loc, tokens = self.parse( instring.expandtabs(), 0 )
> could.net> >   File "C:\Python24\Lib\site-packages\pyparsing.py", line 558, in parse
> could.net> >     loc,tokens = self.parseImpl( instring, loc, doActions )
> could.net> >   File "C:\Python24\Lib\site-packages\pyparsing.py", line 1387, in parseImpl
> could.net> >     loc, exprtokens = e.parse( instring, loc, doActions )
> could.net> >   File "C:\Python24\Lib\site-packages\pyparsing.py", line 562, in parse
> could.net> >     loc,tokens = self.parseImpl( instring, loc, doActions )
> could.net> >   File "C:\Python24\Lib\site-packages\pyparsing.py", line 873, in parseImpl
> could.net> >     raise exc
> could.net> > pyparsing.ParseException: Expected "," (at char 5), (line:1, col:6)
> could.net> > On Thu, 4 Aug 2005 17:24:23 +0800
> could.net> > could ildg <could.net at gmail.com> ׫д��:
> could.net> >
> could.net> > could.net> OK, I make it.
> could.net> > could.net> It's right, it can work fine with unicode.
> could.net> > could.net> pyparsing is great.
> could.net> > could.net> Thanks.
> could.net> > could.net>
> could.net> > could.net> On 8/4/05, could ildg <could.net at gmail.com> wrote:
> could.net> > could.net> > I want to parse some Chinese words.
> could.net> > could.net> > It seems that pyparsing doesn't work for me.
> could.net> > could.net> > Thank you.
> could.net> > could.net> > I have to use re directly, although it's harder, but it'll always work.
> could.net> > could.net> >
> could.net> > could.net> > On 8/4/05, Robert Kern <rkern at ucsd.edu> wrote:
> could.net> > could.net> > > could ildg wrote:
> could.net> > could.net> > > > pyparsing is very convenient to use. But I want to find some a py tool
> could.net> > could.net> > > > to parse non-English strings. Does pyparsing support UNICODE strings?
> could.net> > could.net> > > > If not, can someone tell me what py tool can do it? Thanks in advance.
> could.net> > could.net> > >
> could.net> > could.net> > > Try it!
> could.net> > could.net> > >
> could.net> > could.net> > > # vim:fileencoding=utf-8
> could.net> > could.net> > >
> could.net> > could.net> > > from pyparsing import Word
> could.net> > could.net> > >
> could.net> > could.net> > > text = "��������, �����!".decode('utf-8')
> could.net> > could.net> > > alphas = u''.join(unichr(x) for x in xrange(0x386, 0x3ce))
> could.net> > could.net> > >
> could.net> > could.net> > > greet = Word(alphas) + u',' + Word(alphas) + u'!'
> could.net> > could.net> > > greeting = greet.parseString(text)
> could.net> > could.net> > > print greeting
> could.net> > could.net> > >
> could.net> > could.net> > > --
> could.net> > could.net> > > Robert Kern
> could.net> > could.net> > > rkern at ucsd.edu
> could.net> > could.net> > >
> could.net> > could.net> > > "In the fields of hell where the grass grows high
> could.net> > could.net> > >   Are the graves of dreams allowed to die."
> could.net> > could.net> > >    -- Richard Harter
> could.net> > could.net> > >
> could.net> > could.net> > > --
> could.net> > could.net> > > http://mail.python.org/mailman/listinfo/python-list
> could.net> > could.net> >
> could.net> > could.net> --
> could.net> > could.net> http://mail.python.org/mailman/listinfo/python-list
> could.net> >
> could.net> >
> could.net> >
> could.net> --
> could.net> http://mail.python.org/mailman/listinfo/python-list
> 
> 
>