Python Data Utils

John Machin sjmachin at lexicon.net
Mon Apr 7 05:17:24 EDT 2008


On Apr 7, 4:22 pm, Jesse Aldridge <JesseAldri... at gmail.com> wrote:
>
> > changing "( " to "(" and " )" to ")".
>
> Changed.

But then you introduced more.

>
> I attempted to take out everything that could be trivially implemented
> with the standard library.
> This has left me with... 4 functions in S.py.  1 one of them is used
> internally, and the others aren't terribly awesome :\  But I think the
> ones that remain are at least a bit useful :)

If you want to look at stuff that can't be implemented trivially using
str/unicode methods, and is more than a bit useful, google for
mxTextTools.

>
> > A basic string normalisation-before-comparison function would
> > usefully include replacing multiple internal whitespace characters by
> > a single space.
>
> I added this functionality.

Not quite. I said "whitespace", not "space".

The following is the standard Python idiom for removing leading and
trailing whitespace and replacing one or more whitespace characters
with a single space:

def normalise_whitespace(s):
    return ' '.join(s.split())

If your data is obtained by web scraping, you may find some people use
'\xA0' aka NBSP to pad out fields. The above code will get rid of
these if s is unicode; if s is str, you need to chuck
a .replace('\xA0', ' ') in there somewhere.

HTH,
John




More information about the Python-list mailing list