[Python-Dev] RFD: how to build strings from lots of slices?
M.-A. Lemburg
mal@lemburg.com
Mon, 28 Feb 2000 19:15:15 +0100
Fredrik Lundh wrote:
>
> when hacking on SRE's substitution code, I stumbled
> upon a problem. to do a substitution, SRE needs to
> merge slices from the target strings and from the sub-
> stitution pattern.
>
> here's a simple example:
>
> re.sub(
> "(perl|tcl|java)",
> "python (not \\1)",
> "perl rules"
> )
>
> contains a "substitution pattern" consisting of three
> parts:
>
> "python (not " (a slice from the substitution string)
> group 1 (a slice from the target string)
> ")" (a slice from the substitution string)
>
> PCRE implements this by doing the slicing (thus creating
> three new strings), and then doing a "join" by hand into
> a PyString buffer.
>
> this isn't very efficient, and it also doesn't work for uni-
> code strings.
Why not ? The Unicode implementation has an API
PyUnicode_Join() which does eaxctly this:
extern DL_IMPORT(PyObject*) PyUnicode_Join(
PyObject *separator, /* Separator string */
PyObject *seq /* Sequence object */
);
Note that the PyUnicode_Join() API takes a sequence of
Unicode objects, strings or objects providing the
charbuf interface, coerces all of these into
a Unicode object and then does the joining.
There is also a _PyUnicode_Resize() API. It is currently
not exported though... but that's easy to fix.
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/