python tr equivalent (non-ascii)

Fredrik Lundh fredrik at pythonware.com
Wed Aug 13 04:33:13 EDT 2008


kettle wrote:

>  I was wondering how I ought to be handling character range
> translations in python.
> 
>  What I want to do is translate fullwidth numbers and roman alphabet
> characters into their halfwidth ascii equivalents.
>  In perl I can do this pretty easily with tr:
> 
> tr/\x{ff00}-\x{ff5e}/\x{0020}-\x{007e}/;
> 
>  and I think the string.translate method is what I need to use to
> achieve the equivalent in python.  Unfortunately the maktrans method
> doesn't seem to accept character ranges and I'm also having trouble
> with it's interpretation of length.  What I came up with was to first
> fudge the ranges:
> 
> my_test_string = u"ABCDEFG"
> f_range = "".join([unichr(x) for x in
> range(ord(u"\uff00"),ord(u"\uff5e"))])
> t_range = "".join([unichr(x) for x in
> range(ord(u"\u0020"),ord(u"\u007e"))])
> 
>  then use these as input to maketrans:
> my_trans_string =
> my_test_string.translate(string.maketrans(f_range,t_range))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeEncodeError: 'ascii' codec can't encode characters in position
> 0-93: ordinal not in range(128)

maketrans only works for byte strings.

as for translate itself, it has different signatures for byte strings
and unicode strings; in the former case, it takes lookup table
represented as a 256-byte string (e.g. created by maketrans), in the
latter case, it takes a dictionary mapping from ordinals to ordinals or
unicode strings.

something like

    lut = dict((0xff00 + ch, 0x0020 + ch) for ch in range(0x80))

    new_string = old_string.translate(lut)

could work (untested).

</F>




More information about the Python-list mailing list