Refactoring; arbitrary expression in lists

Thu Jan 13 00:32:53 EST 2005

On Thu, 13 Jan 2005 05:18:57 GMT, Bengt Richter <bokr at oz.net> wrote:
> On Thu, 13 Jan 2005 12:19:06 +1000, Stephen Thorne <stephen.thorne at gmail.com> wrote:
> 
> >On Thu, 13 Jan 2005 01:24:29 GMT, Bengt Richter <bokr at oz.net> wrote:
> >>     extensiondict = dict(
> >>         php = 'application/x-php',
> >>         cpp = 'text/x-c-src',
> >>         # etcetera
> >>         xsl = 'test/xsl'
> >>     )
> >>
> >>     def detectMimeType(filename):
> >>         extension = os.path.splitext(filename)[1].replace('.', '')
>            extension = os.path.splitext(filename)[1].replace('.', '').lower() # better
> 
> >>         try: return extensiondict[extension]
> >>         except KeyError:
> >>             basename = os.path.basename(filename)
> >>             if "Makefile" in basename: return 'text/x-makefile' # XXX case sensitivity?
> >>             raise NoMimeError
> >
> >Why not use a regexp based approach.
> ISTM the dict setup closely reflects the OP's if/elif tests and makes for an efficient substitute
> for the functionality when later used for lookup. The regex list is O(n) and the regexes themselves
> are at least that, so I don't see a benefit. If you are going to loop through extensionlist, you
> might as well write (untested)
<code snipped>

*shrug*, O(n*m) actually, where n is the number of mime-types and m is
the length of the extension.

> >extensionlist = [
> >(re.compile(r'.*\.php') , "application/x-crap-language"),
> >(re.compile(r'.*\.(cpp|c)') , 'text/x-c-src'),
> >(re.compile(r'[Mm]akefile') , 'text/x-makefile'),
> >]
> >for regexp, mimetype in extensionlist:
> >  if regexp.match(filename):
> >     return mimetype
> >
> >if you were really concerned about efficiency, you could use something like:
> >class SimpleMatch:
> >  def __init__(self, pattern): self.pattern = pattern
> >  def match(self, subject): return subject[-len(self.pattern):] == self.pattern
> 
> I'm not clear on what you are doing here, but if you think you are going to compete
> with the timbot's dict efficiency with a casual few lines, I suspect you are PUI ;-)
> (Posting Under the Influence ;-)

Sorry about that, what I was trying to say was something along the lines of:

extensionlist = [
(re.compile(r'.*\.php') , "application/x-crap-language"),
(re.compile(r'.*\.(cpp|c)') , 'text/x-c-src'),
(re.compile(r'[Mm]akefile') , 'text/x-makefile'),
]
can be made more efficient by doing something like this:
extensionlist = [
SimpleMatch(".php"), "application/x-crap-language"),
(re.compile(r'.*\.(cpp|c)') , 'text/x-c-src'),
(re.compile(r'[Mm]akefile') , 'text/x-makefile'),
]
Where SimpleMatch uses a slice and a comparison instead of a regular
expression engine. SimpleMatch and re.compile both return an object
that when you call .match(s) returns a value that can be interpreted
as a boolean.

As for the overall efficiency concerns, I feel that talking about any
of this is premature optimisation. The optimisation that is really
required in this situation is the same as with any
large-switch-statement idiom, be it C or Python. First one must do a
frequency analysis of the inputs to the switch statement in order to
discover the optimal order of tests!

Regards,
Stephen Thorne