python2 vs python3

Chris Angelico rosuav at gmail.com
Mon Oct 21 17:22:35 EDT 2019


On Tue, Oct 22, 2019 at 8:16 AM Albert-Jan Roskam
<sjeik_appie at hotmail.com> wrote:
>
>
>
> On 18 Oct 2019 20:36, Chris Angelico <rosuav at gmail.com> wrote:
>
> On Sat, Oct 19, 2019 at 5:29 AM Jagga Soorma <jagga13 at gmail.com> wrote:
> >
> > Hello,
> >
> > I am writing my second python script and got it to work using
> > python2.x.  However, realized that I should be using python3 and it
> > seems to fail with the following message:
> >
> > --
> > Traceback (most recent call last):
> >   File "test_script.py", line 29, in <module>
> >     test_cmd = ("diskcmd -u " + x + " | grep -v '\*' | awk '{print $1,
> > $3, $4, $9, $10}'" )
> > TypeError: Can't convert 'bytes' object to str implicitly
> > --
> >
> > I then run this command and save the output like this:
> >
> > --
> > test_info = (subprocess.check_output( test_cmd,
> > stderr=subprocess.STDOUT, shell=True )).splitlines()
> > --
> >
> > Looks like the command output is in bytes and I can't simply wrap that
> > around str().  Thanks in advance for your help with this.
>
> >That's correct. The output of the command >is, by default, given to you
> >in bytes.
>
>
> Do you happen to know why this is the default?

Because at the OS level, it's all bytes.

> And is there a reliable way to figure out the encoding? On posix, it's probably utf8, but on windows I usually use cp437, but knowing windows, it could be any codepage (you can even change it with chcp.exe)
>

Reliable? Nope. You can guess at what your local console would expect,
but there's no way to be certain what a program will produce. You
can't even be sure that the program will produce text - for instance,
I have quite often piped data into or out of FFMPEG, which means the
encoding isn't "UTF-8" or "Windows-1252", but is something like
"16-bit 44KHz WAV".

If you're uncertain, I would recommend attempting to decode the data
as either ASCII or UTF-8. Most of the encodings you'll come across
will be ASCII-compatible, meaning that decoding as ASCII will either
succeed and give the right result, or fail with a clear exception.
UTF-8 is designed to be similarly reliable, so you should generally be
able to assume that a successful UTF-8 decode will give you the
correct result.

ChrisA



More information about the Python-list mailing list