python 2 to 3 converter

Chris Angelico rosuav at gmail.com
Mon Dec 9 20:24:51 EST 2019


On Tue, Dec 10, 2019 at 12:15 PM songbird <songbird at anthive.com> wrote:
>
> Chris Angelico wrote:
> ...
> >
> > Here's an example piece of code.
> >
> > sock = socket.socket(...)
> > name = input("Enter your username: ")
> > code = input("Enter the base64 code: ")
> > code = base64.b64decode(code)
> > sock.write("""GET /foo HTTP/1.0
> > Authentication: Demo %s/%s
> >
> > """ % (name, code))
> > match = re.search(r"#[A-Za-z0-9]+#", sock.read())
> > if match: print("Response: " + match.group(0))
> >
> > Your challenge: Figure out which of those strings should be a byte
> > string and which should be text. Or alternatively, prove that this is
> > a hard problem. There are only a finite number of types - two, to be
> > precise - so by your argument, this should be straightforward, right?
>
>   this isn't a process of looking at isolated code.  this
> is a process of looking at the code, but also the test cases
> or working examples.  so the inputs are known and the code
> itself gives clues about what it is expecting.

Okay. The test cases are also written in Python, and they use
unadorned string literals to provide mock values for input() and the
socket response. Now what?

What if the test cases are entirely ASCII characters?

What if the test cases are NOT entirely ASCII characters?

>   regular expressions can be matched in finite time as well
> as a fixed length text of any type can be scanned as a match
> or rejected.
>
>   if you examined a thousand uses of match and found the
> pattern used above and then examined what those programs did
> with that match what would you select as the first type, the
> one used the most first, if that doesn't work go with the 2nd,
> etc.
>

That's not really the point. Are your regular expressions working with
text or bytes? Does your socket return text or bytes?

I've deliberately chosen these examples because they are hard. And I
didn't even get into an extremely hard problem, with the inclusion of
text inside binary data inside of text inside of bytes. (It does
happen.)

These problems are fundamentally hard because there is insufficient
information in the source code alone to determine the programmer's
intent.

ChrisA


More information about the Python-list mailing list