Chardet, file, ... and the Flexible String Representation

Piet van Oostrum piet at vanoostrum.org
Fri Sep 6 11:46:14 EDT 2013


wxjmfauth at gmail.com writes:

> The Flexible String Representation has conceptually to
> face the same problem. It splits "unicode" in chunks and
> it has to solve two problems at the same time, the coding
> and the handling of multiple "char sets". The problem?
> It fails.
> "This poor Flexible String Representation does not succeed
> to solve the problem it create itsself."

The FSR does not split unicode in chuncks. It does not create problems and therefore it doesn't have to solve this. 

The FSR simply stores a Unicode string as an array[*] of ints (the Unicode code points of the characters of the string. That's it. Then it uses a memory-efficient way to store this array of ints. But that has nothing to do with character sets. The same principle could be used for any array of ints.

So you are seeking problems where there are none. And you would have a lot more peace of mind if you stopped doing this.

[*] array in the C sense.
-- 
Piet van Oostrum <piet at vanoostrum.org>
WWW: http://pietvanoostrum.com/
PGP key: [8DAE142BE17999C4]



More information about the Python-list mailing list