Refactoring; arbitrary expression in lists
Stephen Thorne
stephen.thorne at gmail.com
Thu Jan 13 00:32:53 EST 2005
On Thu, 13 Jan 2005 05:18:57 GMT, Bengt Richter <bokr at oz.net> wrote:
> On Thu, 13 Jan 2005 12:19:06 +1000, Stephen Thorne <stephen.thorne at gmail.com> wrote:
>
> >On Thu, 13 Jan 2005 01:24:29 GMT, Bengt Richter <bokr at oz.net> wrote:
> >> extensiondict = dict(
> >> php = 'application/x-php',
> >> cpp = 'text/x-c-src',
> >> # etcetera
> >> xsl = 'test/xsl'
> >> )
> >>
> >> def detectMimeType(filename):
> >> extension = os.path.splitext(filename)[1].replace('.', '')
> extension = os.path.splitext(filename)[1].replace('.', '').lower() # better
>
> >> try: return extensiondict[extension]
> >> except KeyError:
> >> basename = os.path.basename(filename)
> >> if "Makefile" in basename: return 'text/x-makefile' # XXX case sensitivity?
> >> raise NoMimeError
> >
> >Why not use a regexp based approach.
> ISTM the dict setup closely reflects the OP's if/elif tests and makes for an efficient substitute
> for the functionality when later used for lookup. The regex list is O(n) and the regexes themselves
> are at least that, so I don't see a benefit. If you are going to loop through extensionlist, you
> might as well write (untested)
<code snipped>
*shrug*, O(n*m) actually, where n is the number of mime-types and m is
the length of the extension.
> >extensionlist = [
> >(re.compile(r'.*\.php') , "application/x-crap-language"),
> >(re.compile(r'.*\.(cpp|c)') , 'text/x-c-src'),
> >(re.compile(r'[Mm]akefile') , 'text/x-makefile'),
> >]
> >for regexp, mimetype in extensionlist:
> > if regexp.match(filename):
> > return mimetype
> >
> >if you were really concerned about efficiency, you could use something like:
> >class SimpleMatch:
> > def __init__(self, pattern): self.pattern = pattern
> > def match(self, subject): return subject[-len(self.pattern):] == self.pattern
>
> I'm not clear on what you are doing here, but if you think you are going to compete
> with the timbot's dict efficiency with a casual few lines, I suspect you are PUI ;-)
> (Posting Under the Influence ;-)
Sorry about that, what I was trying to say was something along the lines of:
extensionlist = [
(re.compile(r'.*\.php') , "application/x-crap-language"),
(re.compile(r'.*\.(cpp|c)') , 'text/x-c-src'),
(re.compile(r'[Mm]akefile') , 'text/x-makefile'),
]
can be made more efficient by doing something like this:
extensionlist = [
SimpleMatch(".php"), "application/x-crap-language"),
(re.compile(r'.*\.(cpp|c)') , 'text/x-c-src'),
(re.compile(r'[Mm]akefile') , 'text/x-makefile'),
]
Where SimpleMatch uses a slice and a comparison instead of a regular
expression engine. SimpleMatch and re.compile both return an object
that when you call .match(s) returns a value that can be interpreted
as a boolean.
As for the overall efficiency concerns, I feel that talking about any
of this is premature optimisation. The optimisation that is really
required in this situation is the same as with any
large-switch-statement idiom, be it C or Python. First one must do a
frequency analysis of the inputs to the switch statement in order to
discover the optimal order of tests!
Regards,
Stephen Thorne
More information about the Python-list
mailing list