[Tutor] best way to tokenize [was script too slow]
Jeff Shannon
jeff@ccvcorp.com
Wed Feb 26 14:33:01 2003
Paul Tremblay wrote:
>Okay, now I do see the whole thing. The list to join is: first split the
>token by "\\", which will get rid of the "\\", and then add the "\\" to
>each item.
>
>
>
>>> expandedWord = ' '.join(['\\'+item for item in
>>>word.split('\\') if item])
>>>
A good approach with this sort of thing is to try to spread it out into
several lines. One-liners can be convenient, but sometimes they're a
little confusing, especially if you're not terribly familiar with the
way that things fit together. So let's break this down into several steps.
WordList = word.split('\\')
TokenList = ['\\' + item for item in WordList if item]
ExpandedWord = ' '.join(TokenList)
This code has the exact same effect as the one-liner, but is a little
bit easier to figure out (even though it may take a bit longer to read).
I find myself using intermediate variables like this semi-frequently --
if it takes me more than a couple seconds to figure out what a compound
expression is doing, I figure that it's too complex and would be better
to split it into several parts. The decision of how much complexity is
appropriate is, of course, a very personal stylistic one. For instance,
my second line above is actually doing two things -- filtering out any
null items from WordList, and prepending '\\' to each remaining item. I
could have split that out into two separate list comprehensions, and I
could argue that it would make things a little more explicitly clear...
but that would also require two iterations through the list, instead of
one, so it has the potential of actually affecting performance -- and if
it might be a long list, that could be a significant effect. I feel
that the (marginal) extra clarity is not worth that possible performance
loss. On the other hand, splitting the original one-liner into these
three lines adds only a few variable lookups. That's a very small cost,
so it's much easier to argue that the increased clarity is worth it.
Personally, if I was writing code that I thought might be read by others
(especially, say, example code for this list), which would probably
include most code for programs that would be in use for any length of
time, then I'd use the longer multi-line version. Only if I were
writing a quick script, or in a throw-away interactive session, would I
use the one-liner.
Jeff Shannon
Technician/Programmer
Credit International