suggestions for VIN parsing

Tim Chase python.list at tim.thechases.com
Thu Dec 25 21:13:55 EST 2014


On 2014-12-25 17:59, Vincent Davis wrote:
> These are vintage motorcycles so the "VIN's" are not like modern
> VIN's these are frame numbers and engine number.
> I don't want to parse the page, I what a function that given a VIN
> (frame or engine number) returns the year the bike was made.

While I've done automobile VIN processing, I'm unfamiliar with
motorcycle VIN processing.  Traditional VINs consist of 17
alphanumeric (minus look-alike letters), and then
certain offsets designate certain pieces of information
(manufacturer, model, year, country of origin, etc).

If you can describe what an actual VIN looks like and how you want to
extract information from it, I'm sure folks here can come up with
something.

From a rough reading of that URL you provided, it sounds like the
format changed based on the year, so you would have to test against a
a whole bunch of patterns to determine the year, and since some of the
patterns overlap, you'd have to do subsequent checking.  Something
like

import re
def vin_to_year(vin):
  for pattern, min_val, max_val, year in [
      (r'^(\d+)N$', 100, None, 1950),
      (r'^(\d+)NA$', 101, 15808, 1951),
      (r'^(\d+)$', 15809, 25000, 1951),
      (r'^(\d+)$', 25000, 32302, 1952),
      (r'^(\d+)$', 32303, 44134, 1953),
      (r'^(\d+)$', 44135, 56699, 1954),
      (r'^(\d+)$', 56700, 70929, 1955),
      # a whole bunch more like this
      ]:
    r = re.compile(pattern, re.I)
    m = r.match(vin)
    if m:
      if m.groups():
        i = int(m.group(1))
        if min_val is not None and i < min_val: continue
        if max_val is not None and i > max_val: continue
      return year
  # return DEFAULT_YEAR
  raise ValueError("Unable to determine year")

Based on that link, it also looks like you might have to deal with
partial-year ranges, so instead of just returning a year, you might
need/want to return (start_year, start_month, end_year, end_month)
instead of just the raw year.  Thus, you'd have to update the table
above to something like

  (r'^(\d+)$', 15809, 25000, 1951, 1, 1951, 12),
  (r'^KDA(\d+)$, None, None, 1980, 9, 1981, 4),
  (r'^K(\d+)$, None, None, 1974, 8, 1975, 7),
  # note that "K" has to follow "KDA" because of specificity

and then iterate over

  for pattern, min_val, max_val, min_year, min_month, max_year,
  max_month in [ ... ]:
    # ...
      return (min_year, min_month, max_year, max_month)

-tkc







More information about the Python-list mailing list