RE Module Performance

Steven D'Aprano steve+comp.lang.python at pearwood.info
Thu Jul 25 11:26:13 EDT 2013


On Thu, 25 Jul 2013 14:36:25 +0100, Jeremy Sanders wrote:

> wxjmfauth at gmail.com wrote:
> 
>> Short example. Writing an editor with something like the FSR is simply
>> impossible (properly).
> 
> http://www.gnu.org/software/emacs/manual/html_node/elisp/Text-
Representations.html#Text-Representations
> 
> "To conserve memory, Emacs does not hold fixed-length 22-bit numbers
> that are codepoints of text characters within buffers and strings.
> Rather, Emacs uses a variable-length internal representation of
> characters, that stores each character as a sequence of 1 to 5 8-bit
> bytes, depending on the magnitude of its codepoint[1]. For example, any
> ASCII character takes up only 1 byte, a Latin-1 character takes up 2
> bytes, etc. We call this representation of text multibyte.

Well, you've just proven what Vim users have always suspected: Emacs 
doesn't really exist.


> [1] This internal representation is based on one of the encodings
> defined by the Unicode Standard, called UTF-8, for representing any
> Unicode codepoint, but Emacs extends UTF-8 to represent the additional
> codepoints it uses for raw 8- bit bytes and characters not unified with
> Unicode.
> "

Do you know what those characters not unified with Unicode are? Is there 
a list somewhere? I've read all of the pages from here to no avail:

http://www.gnu.org/software/emacs/manual/html_node/elisp/Non_002dASCII-Characters.html



-- 
Steven



More information about the Python-list mailing list