Python3: Reading a text/binary mixed file
Cameron Simpson
cs at zip.com.au
Tue Mar 10 21:09:24 EDT 2015
On 10Mar2015 22:38, Paulo da Silva <p_s_d_a_s_i_l_v_a_ns at netcabo.pt> wrote:
>On 10-03-2015 04:14, Cameron Simpson wrote:
>> On 10Mar2015 04:01, Paulo da Silva <p_s_d_a_s_i_l_v_a_ns at netcabo.pt> wrote:
>>> But this is very tricky! I am on linux, but if I ran this program on
>>> windows I needed to change it to "eat" also the '\r'.
>>
>> If you're in Python 3 (recommended!) and you're parsing the headers as
>> text, you should be converting your split binary into strings anyway. So
>> you can just use .strip() or rstrip(); either will remove trailing '\r'
>> and '\n', so it will work in both UNIX and Windows.
>>
>I didn't know strip removes \r.
The documentation for str.split says it strips "whitespace" by default. In the
string module doco it says:
string.whitespace
A string containing all ASCII characters that are considered
whitespace. This includes the characters space, tab, linefeed,
return, formfeed, and vertical tab.
[...]
>> I presume you're gathering the headers in "binary" mode and decoding
>> each to a string. So you know the consumed length from the binary half;
>> that they're different lengths after decoding to strings is then
>> irrelevant.
>You are right.
>I am still a little confused about python3.
In this context the main point is that python 3 has a nice clean separation of
str (as text) and bytes (as octet sized small ints). In general that makes it
easier to work with in contexts like this because you are never confused about
which you are dealing with.
Since binary files (returning bytes from reads) also have a convenient readline
method looking for byte 10 ('\n') this makes you current task tractable: read
"binary" lines, getting bytes objects ending in byte 10, then decode each
bhytes object into str objects based on the text encoding (typically utf-8, or
iso8859-1 or ascii for some protocols/formats not thinking strongly about bytes
vs text).
Once decoded, you can then work on them as text without worrying about their
former binary encoding.
Cheers,
Cameron Simpson <cs at zip.com.au>
Institutions will try to preserve the problem to which they are the solution.
- Clay Shirky, 2012
More information about the Python-list
mailing list