python tr equivalent (non-ascii)

kettle Josef.Robert.Novak at gmail.com
Wed Aug 13 04:28:35 EDT 2008


On Aug 13, 5:18 pm, kettle <Josef.Robert.No... at gmail.com> wrote:
> Hi,
>  I was wondering how I ought to be handling character range
> translations in python.
>
>  What I want to do is translate fullwidth numbers and roman alphabet
> characters into their halfwidth ascii equivalents.
>  In perl I can do this pretty easily with tr:
>
> tr/\x{ff00}-\x{ff5e}/\x{0020}-\x{007e}/;
>
>  and I think the string.translate method is what I need to use to
> achieve the equivalent in python.  Unfortunately the maktrans method
> doesn't seem to accept character ranges and I'm also having trouble
> with it's interpretation of length.  What I came up with was to first
> fudge the ranges:
>
> my_test_string = u"ABCDEFG"
> f_range = "".join([unichr(x) for x in
> range(ord(u"\uff00"),ord(u"\uff5e"))])
> t_range = "".join([unichr(x) for x in
> range(ord(u"\u0020"),ord(u"\u007e"))])
>
>  then use these as input to maketrans:
> my_trans_string =
> my_test_string.translate(string.maketrans(f_range,t_range))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeEncodeError: 'ascii' codec can't encode characters in position
> 0-93: ordinal not in range(128)
>
>  but it generates an encoding error... and if I encodethe ranges in
> utf8 before passing them on I get a length error because maketrans is
> counting bytes not characters and utf8 is variable width...
> my_trans_string =
> my_test_string.translate(string.maketrans(f_range.encode("utf8"),t_range.encode("utf8")))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> ValueError: maketrans arguments must have same length

Ok so I guess I was barking up the wrong tree.  Searching for python 全角
 半角 quickly brought up a solution:
>>>import unicodedata
>>>my_test_string=u"フガホゲ-%*@ABC−%*@123"
>>>print unicodedata.normalize('NFKC', my_test_string.decode("utf8"))
フガホゲ-%*@ABC-%*@123
>>>

still, it would be nice if there was a more general solution, or if
maketrans actually looked at chars instead of bytes methinks.





More information about the Python-list mailing list