a RegEx puzzle
Charles Hartman
charles.hartman at conncoll.edu
Fri Mar 11 10:05:22 EST 2005
> Charles Hartman <charles.hartman at conncoll.edu> wrote:
>> I'm still shaky on some of sre's syntax. Here's the task: I've got
>> strings (never longer than about a dozen characters) that are
>> guaranteed to be made only of characters 'x' and '/'.
>
> One possibility is to cheat completely, and depending on memory
> constraints
> and how often you need to do this, it might even make sense to do so.
Interesting -- I hadn't thought of that. It would, realistically,
probably be 2 ** 14 or so, but what the hell. Trouble is, this would
require me to pick out by hand/eye -- once only, admittedly --
something like 16,384 "longests," which I do not care to do. If I can
compute them, instead, there's no reason not to do it on the fly; it
doesn 't take *that* long.
> There's only two possible values for each character, so you're really
> dealing with binary numbers. The max length is 12 characters (I'm
> deliberately being oblivious to where you said "about" :-)), so there
> are
> only 2^12 or 4096 possible strings. You could probably afford to
> pre-compute all 4096 strings, use them as dictionary keys, and store a
> (start, end) tuple for each key. Computation then becomes a simple
> dictionary lookup.
>
>> In each string I
>> want to find the longest continuous stretch of pairs whose first
>> character is 'x' and the second is either mark. So if marks =
>> '/xx/xxx///', the "winning" stretch begins at position 2 and is 6
>> characters long ('x/xxx/'), which requires finding a second match that
>> overlaps the first match (which is just 'xx' in position 1). (When
>> there are multiple "winning" stretches, I might want to adjudicate
>> among them, but that's a separate problem.) I hope this is clear
>> enough.
>
> Unfortunately, no, it's not very clear. In fact, I have no clue what
> you're trying to do. Could you try explaining it again?
See below.
>> Charles Hartman
>> Professor of English, Poet in Residence
>
> Hmmm. Are you, by any chance, looking for meter patterns in verse?
Yes, of course. My program that scans iambic pentameters (Scandroid,
ver. 0.2a) is available (GPL) on the "Programs" page of my site listed
below. It pretty much works, which is interesting. (People who know how
to do this generally don't believe a program can do it.) I'm a week or
two (knock on wood) from version 1.0, which handles anapestic meters as
well, and non-pentameter lengths, and do other stuff.
I'm not sure which part of my explanation wasn't clear. The string ('/'
and 'x' only) will ultimately be divided into groups of one, two,
three, or four marks, by many complex operations. One approach to the
whole task (an approach I just discovered a month or two ago, which
supplements a more chunk-by-chunk method) is to look for the longest
substring within that string of (perhaps) 10 or twelve mrks, of which
the following is true: if the substring is divided into pairs, each
pair consists of a 'x' followed by either a 'x' or a '/'. The general
problem is to the find the longest possible substring matching an sre
specfication -- and a general solution to that doesn't seem to be
obvious, though the need for it must arise often enough.
What I want is code (a line, a function . . .) that returns the
beginning-point and length of that longest substring. (I could ask for
the substring itself; it doesn't matter; the information sought will be
some subset of the information supplied with every sre match object.)
For my current "best" solution, see previous message. Hope this is any
clearer.
Charles Hartman
Professor of English, Poet in Residence
http://cherry.conncoll.edu/cohar
http://villex.blogspot.com
More information about the Python-list
mailing list