optparse escaping control characters

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Tue Aug 19 09:21:07 EDT 2008


On Tue, 19 Aug 2008 05:35:27 -0700, wannymahoots wrote:

> optparse seems to be escaping control characters that I pass as
> arguments on the command line.  Is this a bug?  Am I missing something? 
> Can this be prevented, or worked around?

You are misinterpreting the evidence. Here's the short explanation:

optparse isn't escaping a control character, because you're not supplying 
it with a control character. You're supplying it with two normal 
characters, which merely *look* like five (including the quote marks) 
because of Python's special handling of backslashes.


If you need it, here's the long-winded explanation.

I've made a small change to your test.py file to demonstrate:

# test.py (modified)
from optparse import OptionParser
parser = OptionParser()
parser.add_option("-d", dest="delimiter", action="store")
(options, args) = parser.parse_args()
print "Options:", options
print "str of options.delimiter =", str(options.delimiter)
print "repr of options.delimiter =", repr(options.delimiter)
print "len of options.delimiter =", len(options.delimiter)


Here's what it does when I call it:

$ python test.py -d '\t'
Options: {'delimiter': '\\t'}
str of options.delimiter = \t
repr of options.delimiter = '\\t'
len of options.delimiter = 2


When you pass '\t' in the command line, the shell sends a literal 
backslash followed by a lowercase t to Python. That is, it sends the 
literal string '\t', not a control character.

Proof: pass the same string to the "wc" program using "echo". Don't 
forget that echo adds a newline to the string:

$ echo 't' | wc  # just a t
      1       1       2
$ echo '\t' | wc  # a backslash and a t, not a control character
      1       1       3


That's the first half of the puzzle. Now the second half -- why is Python 
adding a *second* backslash to the backslash-t? Actually, it isn't, but 
it *seems* to be adding not just a second backslash but also two quote 
marks.

The backslash in Python is special. If you wanted a literal backslash t 
in a Python string, you would have to type *two* backslashes:

'\\t'

because a single backslash followed by t is escaped to make a tab 
character.

But be careful to note that even though you typed five characters (quote, 
backslash, backslash, t, quote) Python creates a string of length two: a 
single backslash and a t.

Now, when you print something using the str() function, Python hides all 
that complexity from you. Hence the line of output that looks like this:

str of options.delimiter = \t

The argument is a literal backslash followed by a t, not a tab character.

But when you print using the repr() function, Python shows you what you 
would have typed -- five characters as follows:

repr of options.delimiter = '\\t'

But that's just the *display* of a two character string. The actual 
string itself is only two characters, despite the two quotes and the two 
backslashes.

Now for the final piece of the puzzle: when you print most composite 
objects, like the OptParse Value objects -- the object named "options" in 
your code -- Python prints the internals of it using repr() rather than 
str().



-- 
Steven



More information about the Python-list mailing list