Python3: Reading a text/binary mixed file

Cameron Simpson cs at zip.com.au
Tue Mar 10 00:14:31 EDT 2015


On 10Mar2015 04:01, Paulo da Silva <p_s_d_a_s_i_l_v_a_ns at netcabo.pt> wrote:
>On 10-03-2015 00:55, Dave Angel wrote:
>> On 03/09/2015 08:45 PM, Paulo da Silva wrote:
>>> What is the best way to read a file that begins with some few text lines
>>> and whose rest is a binary stream?
[...]
>> Generally speaking, you can treat a piece of a binary (input) file as an
>> encoded string, so you want to open the file as binary, locate the part
>> that's text, and then explicitly decode the string from that.
>
>That's waht I did. However, I was thinking of some other more efficient
>and simple way. For example a command to read text and another to read
>bytes.
>
>For .pnm photo files I read the entire file (I needed it in memory
>anyway), splited a copy separated by b'\n', got the headers stuff and
>then used the original remaining bytes as the photo pixels.
>But this is very tricky! I am on linux, but if I ran this program on
>windows I needed to change it to "eat" also the '\r'.

If you're in Python 3 (recommended!) and you're parsing the headers as text, 
you should be converting your split binary into strings anyway. So you can just 
use .strip() or rstrip(); either will remove trailing '\r' and '\n', so it will 
work in both UNIX and Windows.

>In the .pnm case the headers don't have special chars. They fit into
>ascii. But in a file who have them it would be also difficult to compute
>the consumed length.

I presume you're gathering the headers in "binary" mode and decoding each to a 
string. So you know the consumed length from the binary half; that they're 
different lengths after decoding to strings is then irrelevant.

Cheers,
Cameron Simpson <cs at zip.com.au>

These are my principles, and if you don't like them, I have others.
        - Groucho Marx



More information about the Python-list mailing list