Python Data Utils

John Machin sjmachin at lexicon.net
Sun Apr 6 18:10:51 EDT 2008


On Apr 7, 12:32 am, Jesse Aldridge <JesseAldri... at gmail.com> wrote:
> Thanks for the detailed feedback.  I made a lot of modifications based
> on your advice.  Mind taking another look?
>
> > Some names are a bit obscure - "universify"?
> > Docstrings would help too, and blank lines
>
> I changed the name of universify and added a docstrings to every
> function.

Docstrings go *after* the def statement.

>
> > ...PEP8
>
> I made a few changes in this direction, feel free to take it the rest
> of the way ;)

I doubt anyone will bother to take up your invitation. A few simple
changes would reduce eyestrain e.g. changing "( " to "(" and " )" to
")".

>
> > find_string is a much slower version of the find method of string objects,
>
> Got rid of find_string, and contains.  What are the others?

It seems that you could usefully spend some time reading the
documentation on str methods ... instead of asking other people to do
your job for you, unpaid. E.g. look for a str method that you could
use instead of at least one of the confusingly named "is_white" and
"is_blank"?

>
> > And I don't see what you gain from things like:
> > def count( s, sub ):
> >      return s.count( sub )
>
> Yeah, got rid of that stuff too.  I ported these files from Java a
> while ago, so there was a bit of junk like this lying around.

The penny drops :-)

> > delete_string, as a function, looks like it should delete some string, not
> > return a character; I'd use a string constant DELETE_CHAR, or just DEL,
> > it's name in ASCII.
>
> Got rid of that too :)
>
> > In general, None should be compared using `is` instead of `==`, and
> > instead of `type(x) is type(0)` or `type(x) == type(0)` I'd use
> > `isinstance(x, int)` (unless you use Python 2.1 or older, int, float, str,
> > list... are types themselves)
>
> Changed.

Not in all places ... look at the ends_with function. BTW, this should
be named something like "fuzzy_ends_with".

Why all the testing against None? If you have a convention that ""
means that a value is known to be the zero-length string whereas None
means that the true value is unknown, then:
(1) you should document that convention
(2) you should use it consistently e.g. fuzzy_match(None, None) should
return False.

The get_before function returns None in one case and "" in another; is
this accidental or deliberate?

>
> So, yeah, hopefully things are better now.
>
> Soon developers will flock from all over the world to build this into
> the greatest data manipulation library the world has ever seen!  ...or
> not...
>
> I'm tired.  Making code for other people is too much work :)

When you recover, here are a few more things to consider:

1. Testing if obj is a str or unicode object is best done by
isinstance(obj, basestring) ... you don't need an is_string function.

2. make_fuzzy function: first two statements should read "s =
s.replace(.....)" instead of "s.replace(.....)".

3. Fuzzy matching functions are specialised to an application; I can't
imagine that anyone would be particularly interested in those that you
provide. A basic string normalisation-before-comparison function would
usefully include replacing multiple internal whitespace characters by
a single space.

4. get_after('dog cat', 3) is a baroque and slow way of doing 'dog
cat'[3+1:]

5. Casual inspection of your indentation function gave the impression
that it was stuffed ... verified by a simple test:

>>> for i in range(11):
...    stest = ' ' * i + 'x'
...    print i, indentation(stest, 4)
...
0 0
1 1
2 0
3 0
4 1
5 2
6 1
7 1
8 2
9 3
10 2

HTH,
John




More information about the Python-list mailing list