python 2 to 3 converter

Chris Angelico rosuav at gmail.com
Tue Dec 10 10:11:40 EST 2019


On Wed, Dec 11, 2019 at 1:57 AM songbird <songbird at anthive.com> wrote:
>
> Chris Angelico wrote:
> > On Tue, Dec 10, 2019 at 12:15 PM songbird <songbird at anthive.com> wrote:
> >>
> >> Chris Angelico wrote:
> >> ...
> >> >
> >> > Here's an example piece of code.
> >> >
> >> > sock = socket.socket(...)
> >> > name = input("Enter your username: ")
> >> > code = input("Enter the base64 code: ")
> >> > code = base64.b64decode(code)
> >> > sock.write("""GET /foo HTTP/1.0
> >> > Authentication: Demo %s/%s
> >> >
> >> > """ % (name, code))
> >> > match = re.search(r"#[A-Za-z0-9]+#", sock.read())
> >> > if match: print("Response: " + match.group(0))
> >> >
> >> > Your challenge: Figure out which of those strings should be a byte
> >> > string and which should be text. Or alternatively, prove that this is
> >> > a hard problem. There are only a finite number of types - two, to be
> >> > precise - so by your argument, this should be straightforward, right?
> >>
> >>   this isn't a process of looking at isolated code.  this
> >> is a process of looking at the code, but also the test cases
> >> or working examples.  so the inputs are known and the code
> >> itself gives clues about what it is expecting.
> >
> > Okay. The test cases are also written in Python, and they use
> > unadorned string literals to provide mock values for input() and the
> > socket response. Now what?
>
>   wouldn't there be clues in how that string is used in
> the program itself (either calls to converters or when
> the literal is assigned to some variable or used in a
> print statement)?
>
>
> > What if the test cases are entirely ASCII characters?
>
>   it all goes utf in that case and the string is not
> binary.
>
>
> > What if the test cases are NOT entirely ASCII characters?
>
>   if the program has more than one language then you may
> have to see what the character set falls into.  is it hex
> it it octal or binary or some language.  i'd guess there
> will be clues in the code as to how that string is used
> later.
>
>
> >>   regular expressions can be matched in finite time as well
> >> as a fixed length text of any type can be scanned as a match
> >> or rejected.
> >>
> >>   if you examined a thousand uses of match and found the
> >> pattern used above and then examined what those programs did
> >> with that match what would you select as the first type, the
> >> one used the most first, if that doesn't work go with the 2nd,
> >> etc.
> >>
> >
> > That's not really the point. Are your regular expressions working with
> > text or bytes? Does your socket return text or bytes?
>
>   clues in the program again.  you're not limited to looking
> only at the string itself, but the context of the entire
> program.  i'm sure patterns are there to be found if you
> can scan enough programs they'll start showing up.  once
> you've found a viable pattern then you have a way to
> generate a test case to see if it works or not.
>
>
> > I've deliberately chosen these examples because they are hard. And I
> > didn't even get into an extremely hard problem, with the inclusion of
> > text inside binary data inside of text inside of bytes. (It does
> > happen.)
> >
> > These problems are fundamentally hard because there is insufficient
> > information in the source code alone to determine the programmer's
> > intent.
>
>   that is why we would be running the program itself and
> examining test case results.
>
>   none of these programs run in isolation, information is
> known what they expect as input or produce as output.
>

Go do it. Then come back and revisit your assumptions here.

ChrisA


More information about the Python-list mailing list