efficiently splitting up strings based on substrings

Rhodri James rhodri at wildebst.demon.co.uk
Sat Sep 5 19:51:44 EDT 2009


On Sun, 06 Sep 2009 00:29:14 +0100, per <perfreem at gmail.com> wrote:

> it's exactly the same problem, except there are no constraints on the
> strings.  so the problem is, like you say, matching the substrings
> against the string x. in other words, finding out where x "aligns" to
> the ordered substrings abc, and then determine what chunk of x belongs
> to a, what chunk belongs to b, and what chunk belongs to c.
>
> so in the example i gave above, the substrings are: a = 1030405, b =
> 1babcf, c = fUUIUP, so abc = 10304051babcffUUIUP
>
> given a substring like 4051ba, i'd want to split it into the chunks a,
> b, and c. in this case, i'd want the result to be: ["405", "1ba"] --
> i.e. "405" is the chunk of x that belongs to a, and "1ba" the chunk
> that belongs to be. in this case, there are no chunks of c.  if x
> instead were "4051babcffUU", the right output is: ["405", "1babcf",
> "fUU"], which are the corresponding chunks of a, b, and c that make up
> x respectively.
>
> i'm not sure how to approach this. any ideas/tips would be greatly
> appreciated. thanks again.

I see, I think.  Let me explain it back to you, just to be sure.

You have a string x, and three component strings a, b and c.  x is
a substring of the concatenation of a, b and c (i.e. a+b+c).  You
want to find out how x overlaps a, b and c.

Assuming I've understood this right, you're overthinking the problem.
All you need to do is find the start of x in a+b+c, then do some
calculations based on the string lengths and slice appropriately.
I'd scribble some example code, but it's nearly 1am and I'd be sure
to commit fence-post errors at this time of night.

-- 
Rhodri James *-* Wildebeest Herder to the Masses



More information about the Python-list mailing list