is there a safe marshaler?

Alan Kennedy alanmk at hotmail.com
Thu Feb 10 17:49:07 EST 2005


[Irmen de Jong]
 >>> I need a fast and safe (secure) marshaler.

[Alan Kennedy]
 >> ...., would something JSON be suitable for your need?
 >>
 >> http://json.org

[Irmen de Jong]
 > Looks very interesting indeed, but in what way would this be
 > more secure than say, pickle or marshal?
 > A quick glance at some docs reveal that they are using eval
 > to process the data... ouch.

Well, the python JSON codec provided appears to use eval, which might 
make it *seem* unsecure.

http://www.json-rpc.org/pyjsonrpc/index.xhtml

But a more detailed examination of the code indicates, to this reader at 
least, that it can be made completely secure very easily. The designer 
of the code could very easily have not used eval, and possibly didn't do 
so simply because he wasn't thinking in security terms.

The codec uses tokenize.generate_tokens to split up the JSON string into 
tokens to be interpreted as python objects. tokenize.generate_tokens 
generates a series of textual name/value pairs, so nothing insecure 
there: the content of the token/strings is not executed.

Each of the tokens is then passed to a "parseValue" function, which is 
defined thusly:

#===================

   def parseValue(self, tkns):
     (ttype, tstr, ps, pe, lne) = tkns.next()
     if ttype in [token.STRING, token.NUMBER]:
       return eval(tstr)
     elif ttype == token.NAME:
       return self.parseName(tstr)
     elif ttype == token.OP:
       if tstr == "-":
        return - self.parseValue(tkns)
       elif tstr == "[":
        return self.parseArray(tkns)
       elif tstr == "{":
        return self.parseObj(tkns)
       elif tstr in ["}", "]"]:
        return EndOfSeq
       elif tstr == ",":
        return SeqSep
       else:
        raise "expected '[' or '{' but found: '%s'" % tstr
     else:
       return EmptyValue

#===================

As you can see, eval is *only* called when the next token in the stream 
is either a string or a number, so it's really just a very simple code 
shortcut to get a value from a string or number.

If one defined the function like this (not tested!), to remove the eval, 
I think it should be safe.

#===================

default_number_type = float
#default_number_type = int

   def parseValue(self, tkns):
     (ttype, tstr, ps, pe, lne) = tkns.next()
     if ttype in [token.STRING]:
       return tstr
     if ttype in [token.NUMBER]:
       return default_number_type(tstr)
     elif ttype == token.NAME:
       return self.parseName(tstr)
     elif ttype == token.OP:
       if tstr == "-":
        return - self.parseValue(tkns)
       elif tstr == "[":
        return self.parseArray(tkns)
       elif tstr == "{":
        return self.parseObj(tkns)
       elif tstr in ["}", "]"]:
        return EndOfSeq
       elif tstr == ",":
        return SeqSep
       else:
        raise "expected '[' or '{' but found: '%s'" % tstr
     else:
       return EmptyValue

#===================

The only other use of eval is also only for string types, i.e. in the 
parseObj function:

#===================
   def parseObj(self, tkns):
     obj = {}
     nme =""
     try:
       while 1:
         (ttype, tstr, ps, pe, lne) = tkns.next()
         if ttype == token.STRING:
           nme =  eval(tstr)
           (ttype, tstr, ps, pe, lne) = tkns.next()
           if tstr == ":":
             v = self.parseValue(tkns)
      # Remainder of this function elided
#===================

Which could similarly be replaced with direct use of the string itself, 
rather than eval'ing it. (Although one might want to look at encoding 
issues: I haven't looked at JSON-RPC enough to know how it proposes to 
handle string encodings.)

So I don't think there any serious security issues here: the 
"simplicity" of the JSON grammar is what attracted me to it in the first 
place, especially since there are already robust and efficient lexers 
and parsers already available built-in to python and javascript (and 
javascript interpreters are getting pretty ubiquitous these days).

And it's certainly the case that if the only available python impl of 
JSON/RPC is not secure, it is possible to write one that is both 
efficient and secure.

Hopefully there isn't some glaring security hole that I've missed: 
doubtless I'll find out real soon ;-) Gotta love full disclosure.

regards,

-- 
alan kennedy
------------------------------------------------------
email alan:              http://xhaus.com/contact/alan



More information about the Python-list mailing list