[Tutor] string conversion

Steven D'Aprano steve at pearwood.info
Tue Mar 1 00:44:37 CET 2011


Robert Clement wrote:
> Hi
> 
> I have a wxpython control in which users are intended to enter control 
> characters used to define binary string delimiters,  eg. '\xBA\xBA' or 
> '\t\r\n' .


Do you mean that your users enter *actual* control characters? What do 
they type to enter (say) an ASCII null character into the field?

Or do you mean they type the string representation of the control 
character, e.g. for ASCII null they press \ then 0 on their keyboard, 
and the field shows \0 rather than one of those funny little square 
boxes you get for missing characters in fonts.

I will assume you mean the second, because I can't imagine how to enter 
control characters directly into a field (other than the simple ones 
like newline and tab).


> The string returned by the control is a unicode version of the string 
> entered by the user, eg.  u'\\xBA\\xBA'  or  u'\\t\\r\\n' .

The data you are dealing with is binary, that is, made up of bytes 
between 0 and 255. The field is Unicode, that is, made up of characters 
with code points between 0 and some upper limit which is *much* higher 
than 255. If wxpython has some way to set the encoding of the field to 
ASCII, that will probably save you a lot of grief; otherwise, you'll 
need to decide what you want to do if the user types something like £ or 
© or other unicode characters.

In any case, it seems that you are expecting strings with the 
representation of control characters, rather than actual control characters.


> I would like to be able retrieve the original string containing the 
> escaped control characters or hex values so that I can assign it to a 
> variable to be used to split the binary string.

You have the original string -- the user typed <backslash> <code>, and 
you are provided <backslash> <code>.

Remember that backslashes in Python are special, and so they are escaped 
when displaying the string. Because \t is used for the display of tab, 
it can't be used for the display of backslash-t. Instead the display of 
backslash is backslash-backslash. But that's just the *display*, not the 
string itself. If you type \t into your field, and retrieve the string 
which looks like u'\\t', if you call len() on the string you will get 2, 
not 3, or 6. If you print it with the print command, it will print as \t 
with no string delimiters u' and ' and no escaped backslash.

So you have the original string, exactly as typed by the user. I *think* 
what you want is to convert it to *actual* control characters, so that a 
literal backslash-t is converted to a tab character, etc.


 >>> s = u'\\t'
 >>> print len(s), s, repr(s)
2 \t u'\\t'
 >>> t = s.decode('string_escape')
 >>> print len(t), t, repr(t)
1       '\t'


Hope that helps.




-- 
Steven



More information about the Tutor mailing list