reading a line in file

Jay Loden python at jayloden.com
Mon Aug 20 17:41:09 EDT 2007


Shawn Milochik wrote:
> Although you're technically correct, I think there's a knee-jerk
> anti-regex reaction, citing the meaningless overhead. If you're
> running many thousands of records or something then it becomes a small
> issue compared to a replace statement or something. But in most cases
> it makes no difference at all.
> 
> Run the example script both ways and I'm sure there will be no
> difference, and I prefer a clear regex to a convoluted (in my opinion)
> substring call.
> 
> In any case, it's a preference, and I have never seen anything which
> convinced me that one should avoid regexes at all costs unless there's
> no other way to do it.
> 
> And the comment about solving a problem by using regular expressions
> creating another problem is just asinine.  Like so many other things,
> it's often repeated without any thought about whether it is true in
> general, much less in the situation in question.

I agree that in an extremely trivial example, it doesn't matter which method you use. Using a regex in this instance is *very* slightly slower in a single execution and isn't something anyone would notice. However, if the script is run multiple times, that increases. e.g. over 1000 iterations, avoiding regexes even in this trivial example saves ~10 seconds. The example is/was trivial. If the real-life scenario is truly as inconsequential, then it won't matter. However, I've seen a great many instances in my life where similarly tiny details have come back to bite someone (myself included) later. 

As an example, I once coded a simple database query script that took less than a second to run, and all was well - until that was being used to query status for hundreds of items individually to generate a report. When those < 1s run times piled up on top of each other and resulted in 15-20 *minute* run times. Making a simple change to the query which seemed unnecessary in the smaller case ended up reducing that 15m run time to a few seconds.

There are plenty of reasons to use regular expressions, and you're also right that avoiding them at all costs isn't always the right approach either. There's no universal solution to any problem, which is what was all that was meant by my original comment. If you assume the regular expressions are always the solution, then you may well miss a much faster, more efficient, more elegant, or readable solution. Conversely, ignoring regular expressions when they *are* the most efficient, elegant, and logical way to solve a problem is asinine, to borrow your phrasing.

I hope you don't feel I was picking on you - or on regular expressions for that matter! I was attempting only to offer another helpful solution that was possibly easier to read for someone not familiar with regular expressions, and a little bit faster in the event that multiple repetitions were required or the task itself was larger than indicated in the OP's description.

-Jay



More information about the Python-list mailing list