JSON encoding PDF or Excel files in Python 2.7

Irmen de Jong irmen.NOSPAM at xs4all.nl
Fri Jul 21 15:30:13 EDT 2017


On 21/07/2017 20:52, Skip Montanaro wrote:
> I would like to JSON encode some PDF and Excel files. I can read the content:
> 
> pdf = open("somefile.pdf", "rb").read()
> 
> but now what?  json.dumps() insists on treating it as a string to be
> interpreted as utf-8, and bytes == str in Python 2.x. I can't
> json.dumps() a bytearray. I can pickle the raw content and json.dumps
> that, but I can't guarantee the listener at the other end will be
> written in Python. Am I going to have to do something like
> base64-encode the raw bytes to transmit them?
> 
> Thx,
> 
> Skip
> 

Yes, json is a text based format and can't contain arbitrary binary data. So you'll have
to encode the bytes into some textual form first.
If you think base-64 is too verbose you might try base-85 instead which is slightly more
efficient (available since python 3.4 in the base64 module)?

Irmen





More information about the Python-list mailing list