(Simple?) Unicode Question

Rami Chowdhury rami.chowdhury at gmail.com
Thu Aug 27 12:44:41 EDT 2009


> Further, does anything, except a printing device need to know the
> encoding of a piece of "text"?

I may be wrong, but I believe that's part of the idea between separation  
of string and bytes types in Python 3.x. I believe, if you are using  
Python 3.x, you don't need the character encoding mumbo jumbo at all ;-)

If you're using Python 2.x, though, I believe if you simply set the file  
opening mode to binary then data you read() should still be treated as an  
array of bytes, although you may encounter issues trying to access the  
n'th character.

Please do correct me if I'm wrong, anyone.

On Thu, 27 Aug 2009 09:39:06 -0700, Shashank Singh  
<shashank.sunny.singh at gmail.com> wrote:

> Hi All!
>
> I have a very simple (and probably stupid) question eluding me.
> When exactly is the char-set information needed?
>
> To make my question clear consider reading a file.
> While reading a file, all I get is basically an array of bytes.
>
> Now suppose a file has 10 bytes in it (all is data, no metadata,
> forget the BOM and stuff for a little while). I read it into an array of  
> 10
> bytes, replace, say, 2nd bytes and write all the bytes back to a new
> file.
>
> Do i need the character encoding mumbo jumbo anywhere in this?
>
> Further, does anything, except a printing device need to know the
> encoding of a piece of "text"? I mean, as long as we are not trying
> to get a symbolic representation of a "text" or get "i"th character
> of it, all we need to do is to carry the intended encoding as
> an auxiliary information to the data stored as byte array.
>
> Right?
>
> --shashank



-- 
Rami Chowdhury
"Never attribute to malice that which can be attributed to stupidity" --  
Hanlon's Razor
408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)



More information about the Python-list mailing list