String Template

Chris Angelico rosuav at gmail.com
Fri Dec 27 08:47:03 EST 2013


On Fri, Dec 27, 2013 at 11:55 PM,  <t.giuseppe at gmail.com> wrote:
> I'm rewriting a program previously written in C #, and trying to keep the same configuration file, I have a problem with untapped strings.

Not sure what you mean by "untapped" here?

> Taking for example a classic line of apache log:
>
> 0.0.0.0 - [27/Dec/2013: 00:56:51 +0100] "GET / webdav / HTTP/1.1" 404 524 "-" "Mozilla/5.0 (Windows, U, Windows NT 5.1, en-US , rv: 1.9.2.12) Gecko/20101026 Firefox/3.6.12 "
>
> Is there any way to pull out the values so arranged as follows:
>
> ip = 0.0.0.0
> date = 27/Dec/2013: 00:56:51 +0100
> url = / webdav /
>

(Aside: Do you really have spaces in your URLs? That seems odd.)

One common way to implement this sort of thing is with a regular
expression. You can either derive a regex from your config file, or
have users directly manage the regex.

For the specific case of parsing the Apache common log format, there's
plenty of material around. This page [1] has a tidy regex that'll do
the job, and this module [2] purports to create a parser by reading
the configuration line that creates it. I don't know anything about
either, save that they came up in a Google search for 'python apache
common log', along with a whole lot of other decent-looking results.

But for a more general solution - supposing you have piles and piles
of those parser strings - I'd be inclined to write a preparser that
reads your config file and derives regex patterns. It needs to figure
out what's a placeholder and what's literal text, then escape the
literal text (if there are regex metacharacters in it) and come up
with some sort of capturing sequence for the placeholder. I don't know
what you'd want there; possibly (.*?) will be the best (that means
"capture any number of characters, as few as possible"). But you know
your data far better than I do.

ChrisA

[1] http://www.seehuhn.de/blog/52
[2] https://pypi.python.org/pypi/apachelog/1.0



More information about the Python-list mailing list