[SciPy-dev] chararray docstrings

Michael Droettboom mdroe at stsci.edu
Mon Oct 12 11:40:47 EDT 2009


I was able to make my big chararray commit today.  If I understand 
correctly, I need to wait 24 hours for the doc editor to sync with SVN, 
and then I should mark all the chararray-related docstrings as "needs 
review".

The primary change to the docstrings is that all of the methods of the 
chararray class are now free functions.  These free functions represent 
the "primary" entry points, and thus have detailed documentation, and 
the chararray methods now have short "pointer" docstrings to the free 
functions.

Where the docstring content itself has been updated, it is mainly to 
bring them closer to the Python standard library descriptions of these 
functions, which in most cases was more precise (since we are, in fact, 
calling the stdlib function under the hood) and concise (because the 
stdlib docs have been through a number of revisions and really get it 
right by now).

I do have a concern about one phrase that was used in a number of places 
that probably deserves some discussion:

"The chararray module exists for backwards compatibility with Numarray, 
it is not recommended for new development. If one needs arrays of 
strings, use arrays of dtype 
<http://docs.scipy.org/numpy/docs/numpy.dtype/#dtype> object."

There are many use cases (such as handling a binary structured format 
like FITS) where a dtype of 'string_' is more appropriate than a dtype 
of 'object_', and we shouldn't imply that all uses of chararray should 
now use object arrays.  Additionally, fast vectorized string operations 
will perform best on arrays of type 'string_' and 'unicode_', though 
'object_' will work, it requires casting all objects to strings along 
the way, and could fail thousands of items in to an operation.  It's a 
"best tool for the job" judgment call, not a "one tool fits all".  
Perhaps the above should read:

"If one needs arrays of strings, use arrays of dtype 
<http://docs.scipy.org/numpy/docs/numpy.dtype/#dtype> string_ or 
unicode_.  If one needs arrays of variable-length strings, use arrays of 
dtype object_."

Mike

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA




More information about the SciPy-Dev mailing list