python tr equivalent (non-ascii)

kettle Josef.Robert.Novak at gmail.com
Wed Aug 13 06:12:48 EDT 2008


On Aug 13, 5:33 pm, Fredrik Lundh <fred... at pythonware.com> wrote:
> kettle wrote:
> >  I was wondering how I ought to be handling character range
> > translations in python.
>
> >  What I want to do is translate fullwidth numbers and roman alphabet
> > characters into their halfwidth ascii equivalents.
> >  In perl I can do this pretty easily with tr:
>
> > tr/\x{ff00}-\x{ff5e}/\x{0020}-\x{007e}/;
>
> >  and I think the string.translate method is what I need to use to
> > achieve the equivalent in python.  Unfortunately the maktrans method
> > doesn't seem to accept character ranges and I'm also having trouble
> > with it's interpretation of length.  What I came up with was to first
> > fudge the ranges:
>
> > my_test_string = u"ABCDEFG"
> > f_range = "".join([unichr(x) for x in
> > range(ord(u"\uff00"),ord(u"\uff5e"))])
> > t_range = "".join([unichr(x) for x in
> > range(ord(u"\u0020"),ord(u"\u007e"))])
>
> >  then use these as input to maketrans:
> > my_trans_string =
> > my_test_string.translate(string.maketrans(f_range,t_range))
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in ?
> > UnicodeEncodeError: 'ascii' codec can't encode characters in position
> > 0-93: ordinal not in range(128)
>
> maketrans only works for byte strings.
>
> as for translate itself, it has different signatures for byte strings
> and unicode strings; in the former case, it takes lookup table
> represented as a 256-byte string (e.g. created by maketrans), in the
> latter case, it takes a dictionary mapping from ordinals to ordinals or
> unicode strings.
>
> something like
>
>     lut = dict((0xff00 + ch, 0x0020 + ch) for ch in range(0x80))
>
>     new_string = old_string.translate(lut)
>
> could work (untested).
>
> </F>

excellent.  i didnt realize from the docs that i could do that. thanks



More information about the Python-list mailing list