how to avoid leading white spaces

Chris Torek nospam at torek.net
Fri Jun 3 17:45:07 EDT 2011


>On 2011-06-03, rurpy at yahoo.com <rurpy at yahoo.com> wrote:
[prefers]
>>     re.split ('[ ,]', source)

This is probably not what you want in dealing with
human-created text:

    >>> re.split('[ ,]', 'foo bar, spam,maps')
    ['foo', '', 'bar', '', 'spam', 'maps']

Instead, you probably want "a comma followed by zero or
more spaces; or, one or more spaces":

    >>> re.split(r',\s*|\s+', 'foo bar, spam,maps')
    ['foo', 'bar', 'spam', 'maps']

or perhaps (depending on how you want to treat multiple
adjacent commas) even this:

    >>> re.split(r',+\s*|\s+', 'foo bar, spam,maps,, eggs')
    ['foo', 'bar', 'spam', 'maps', 'eggs']

although eventually you might want to just give in and use the
csv module. :-)  (Especially if you want to be able to quote
commas, for instance.)

>> ...  With regexes the code is likely to be less brittle than a
>> dozen or more lines of mixed string functions, indexes, and
>> conditionals.

In article <94svm4Fe7eU1 at mid.individual.net>
Neil Cerutti  <neilc at norwich.edu> wrote:
[lots of snippage]
>That is the opposite of my experience, but YMMV.

I suspect it depends on how familiar the user is with regular
expressions, their abilities, and their limitations.

People relatively new to REs always seem to want to use them
to count (to balance parentheses, for instance).  People who
have gone through the compiler course know better. :-)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)      http://web.torek.net/torek/index.html



More information about the Python-list mailing list