Python Data Utils
John Machin
sjmachin at lexicon.net
Mon Apr 7 05:17:24 EDT 2008
On Apr 7, 4:22 pm, Jesse Aldridge <JesseAldri... at gmail.com> wrote:
>
> > changing "( " to "(" and " )" to ")".
>
> Changed.
But then you introduced more.
>
> I attempted to take out everything that could be trivially implemented
> with the standard library.
> This has left me with... 4 functions in S.py. 1 one of them is used
> internally, and the others aren't terribly awesome :\ But I think the
> ones that remain are at least a bit useful :)
If you want to look at stuff that can't be implemented trivially using
str/unicode methods, and is more than a bit useful, google for
mxTextTools.
>
> > A basic string normalisation-before-comparison function would
> > usefully include replacing multiple internal whitespace characters by
> > a single space.
>
> I added this functionality.
Not quite. I said "whitespace", not "space".
The following is the standard Python idiom for removing leading and
trailing whitespace and replacing one or more whitespace characters
with a single space:
def normalise_whitespace(s):
return ' '.join(s.split())
If your data is obtained by web scraping, you may find some people use
'\xA0' aka NBSP to pad out fields. The above code will get rid of
these if s is unicode; if s is str, you need to chuck
a .replace('\xA0', ' ') in there somewhere.
HTH,
John
More information about the Python-list
mailing list