[Python-Dev] RFD: how to build strings from lots of slices?

M.-A. Lemburg mal@lemburg.com
Mon, 28 Feb 2000 19:15:15 +0100


Fredrik Lundh wrote:
> 
> when hacking on SRE's substitution code, I stumbled
> upon a problem.  to do a substitution, SRE needs to
> merge slices from the target strings and from the sub-
> stitution pattern.
> 
> here's a simple example:
> 
>     re.sub(
>         "(perl|tcl|java)",
>         "python (not \\1)",
>         "perl rules"
>     )
> 
> contains a "substitution pattern" consisting of three
> parts:
> 
>     "python (not " (a slice from the substitution string)
>     group 1 (a slice from the target string)
>     ")" (a slice from the substitution string)
> 
> PCRE implements this by doing the slicing (thus creating
> three new strings), and then doing a "join" by hand into
> a PyString buffer.
> 
> this isn't very efficient, and it also doesn't work for uni-
> code strings.

Why not ? The Unicode implementation has an API
PyUnicode_Join() which does eaxctly this:

extern DL_IMPORT(PyObject*) PyUnicode_Join(
    PyObject *separator, 	/* Separator string */
    PyObject *seq	 	/* Sequence object */
    );
 
Note that the PyUnicode_Join() API takes a sequence of
Unicode objects, strings or objects providing the
charbuf interface, coerces all of these into
a Unicode object and then does the joining.

There is also a _PyUnicode_Resize() API. It is currently
not exported though... but that's easy to fix.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/