Re for Apache log file format

Cameron Simpson cs at zip.com.au
Tue Oct 8 18:17:44 EDT 2013


On 08Oct2013 10:59, Skip Montanaro <skip at pobox.com> wrote:
| > Aiui apache log format uses space as delimiter, encapsulates strings in
| > '"' characters, and uses '-' as an empty field.
| 
| Specifying the field delimiter as a space, you might be able to use
| the csv module to read these. I haven't done any Apache log file work
| since long before the csv module was available, but it just might
| work.

You can definitely do this. I pull things out of apache log files
using awk in exactly this fashion. It does rely on each of the
"real" fields having a fixed number of "words" in it. You just stick
the fields back together again.

And also in Python.

I've got a merge-apache-logs script to read multiple logs, presumed
in time order, and produce a single output stream for passing to
log analysis tools:

  https://bitbucket.org/cameron_simpson/css/src/tip/bin/merge-apache-logs

It is a bit of a hack, but useful.

It has an "aptime" function to pull and parse the time field from
the line which starts like this:

        def aptime(logline, zones, defaultZone):
          ''' Compute a datetime object from the supplied Apache log line.
              `defaultZone` is the timezone to use if it cannot be deduced.
          '''
          fields = logline.split()
          if len(fields) < 5:
            ##warning("bad log line: %s", logline)
            return None

          dt = None
          tzinfo = None

          # try for desired "[DD/Mon/YYYY:HH:MM:SS +hhmm]" format
          humantime, tzinfo = fields[3], fields[4]
          if len(humantime) == 21 \
          and humantime.startswith('[') \
          and tzinfo.endswith(']'):
            try:
              dt = datetime.strptime(humantime, "[%d/%b/%Y:%H:%M:%S")
            except ValueError, e:
              dt = None
            if dt is None:
              tzinfo = None
            else:
              tzinfo = tzinfo[:-1]

and proceeeds otherwise (we have a few different log formats in play, alas).

So regexpas are not your only choice here, and possibly not even the best choice.

Cheers,
-- 
Cameron Simpson <cs at zip.com.au>

This is not a bug. It's just the way it works, and makes perfect sense.
        - Tom Christiansen <tchrist at jhereg.perl.com>
I like that line. I hope my boss falls for it.
        - Chaim Frenkel <chaimf at cris.com>



More information about the Python-list mailing list