[scikit-learn] Inconsistencies in clustering documentations

Beaugnon Anael anael.beaugnon at ssi.gouv.fr
Wed May 23 12:53:44 EDT 2018


Thanks for your answers.

DBSCAN has the correct doc because the fit_predict method is not
inherited, but it has its own implementation (because of the additional
parameter sample_weight).

I have forked the sklearn repo. I work in a virtualenv (virtualenv venv3
--no-site-packages --python python3.5).
*python3 setup.py install* completes, but *make test-code* and *make
doc-noplot* fail.

Do you have any idea about the origin of these errors ?
I intend to install work on the python3 version. When I run make
test-code, I am surprise that there are references to /usr/lib/python2.7/.

Thanks for your help,

Anaël Beaugnon
*

**make doc-noplot*

Exception occurred:
  File "/usr/lib/python3.5/zipfile.py", line 1435, in write
    st = os.stat(filename)
FileNotFoundError: [Errno 2] No such file or directory:
'/<dir>/scikit-learn/doc/auto_examples/plot_digits_pipe.ipynb'
The full traceback has been saved in /tmp/sphinx-err-ivjeif0v.log, if
you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error
message can be provided next time.
A bug report can be filed in the tracker at
<https://github.com/sphinx-doc/sphinx/issues>. Thanks!

File /tmp/sphinx-err-ivjeif0v.log

# Sphinx version: 1.7.4
# Python version: 3.5.3 (CPython)
# Docutils version: 0.14
# Jinja2 version: 2.10
# Last messages:

# Loaded extensions:
Traceback (most recent call last):
  File
"/<dir>/scikit-learn/venv3/lib/python3.5/site-packages/sphinx/cmdline.py",
line 303, in main
    args.warningiserror, args.tags, args.verbosity, args.jobs)
  File
"/<dir>/scikit-learn/venv3/lib/python3.5/site-packages/sphinx/application.py",
line 233, in __init__
    self._init_builder()
  File
"/<dir>/scikit-learn/venv3/lib/python3.5/site-packages/sphinx/application.py",
line 311, in _init_builder
    self.emit('builder-inited')
  File
"/<dir>/scikit-learn/venv3/lib/python3.5/site-packages/sphinx/application.py",
line 444, in emit
    return self.events.emit(event, self, *args)
  File
"/<dir>/scikit-learn/venv3/lib/python3.5/site-packages/sphinx/events.py",
line 79, in emit
    results.append(callback(*args))
  File
"/<dir>/scikit-learn/venv3/lib/python3.5/site-packages/sphinx_gallery/gen_gallery.py",
line 247, in generate_gallery_rst
    download_fhindex = generate_zipfiles(gallery_dir)
  File
"/<dir>/scikit-learn/venv3/lib/python3.5/site-packages/sphinx_gallery/downloads.py",
line 115, in generate_zipfiles
    jy_zipfile = python_zip(listdir, gallery_dir, ".ipynb")
  File
"/<dir>/scikit-learn/venv3/lib/python3.5/site-packages/sphinx_gallery/downloads.py",
line 69, in python_zip
    zipf.write(file_src, os.path.relpath(file_src, gallery_path))
  File "/usr/lib/python3.5/zipfile.py", line 1435, in write
    st = os.stat(filename)
FileNotFoundError: [Errno 2] No such file or directory:
'/<dir>/scikit-learn/doc/auto_examples/plot_digits_pipe.ipynb'


*make test-code*

=======================================================================
ERRORS
=======================================================================
_________________________________________________________________ ERROR
collecting 
__________________________________________________________________
/usr/lib/python2.7/dist-packages/py/_path/common.py:366: in visit
    for x in Visitor(fil, rec, ignore, bf, sort).gen(self):
/usr/lib/python2.7/dist-packages/py/_path/common.py:405: in gen
    if p.check(dir=1) and (rec is None or rec(p))])
/usr/lib/python2.7/dist-packages/_pytest/main.py:682: in _recurse
    ihook = self.gethookproxy(path)
/usr/lib/python2.7/dist-packages/_pytest/main.py:587: in gethookproxy
    my_conftestmodules = pm._getconftestmodules(fspath)
/usr/lib/python2.7/dist-packages/_pytest/config.py:339: in
_getconftestmodules
    mod = self._importconftest(conftestpath)
/usr/lib/python2.7/dist-packages/_pytest/config.py:364: in _importconftest
    raise ConftestImportFailure(conftestpath, sys.exc_info())
E   ConftestImportFailure: ImportError('No module named
_check_build\n___________________________________________________________________________\nContents
of /<dir>/scikit-learn/sklearn/__check_build:\n__pycache__              
setup.py                  __init__.pyc\n_check_build.pyx         
_check_build.cpython-35m-x86_64-linux-gnu.so_check_build.c\n__init__.py\n___________________________________________________________________________\nIt
seems that scikit-learn has not been built correctly.\n\nIf you have
installed scikit-learn from source, please do not forget\nto build the
package before using it: run `python setup.py install` or\n`make` in the
source directory.\n\nIf you have used an installer, please check that it
is suited for your\nPython version, your operating system and your
platform.',)
E     File "/<dir>/scikit-learn/sklearn/__init__.py", line 63, in <module>
E       from . import __check_build
E     File "/<dir>/scikit-learn/sklearn/__check_build/__init__.py", line
46, in <module>
E       raise_build_error(e)
E     File "/<dir>/scikit-learn/sklearn/__check_build/__init__.py", line
41, in raise_build_error
E       %s""" % (e, local_dir, ''.join(dir_content).strip(), msg))
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1
errors during collection
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
============================================================== 1 error
in 0.27 seconds
===============================================================




Le 23/05/2018 à 18:09, Andreas Mueller a écrit :
>
> +1 for a PR on fit_predict docs. This is probably due to the
> inheritance structure.
> Though it's weird that DBSCAN has the correct docs.
>
> I'm not sure about renaming affinity, but we can discuss that. I agree
> it's misleading.
>
>
> On 5/23/18 8:01 AM, Tom DLT wrote:
>> Hi Anaël,
>>
>> Thanks for spotting these inconsistencies.
>> You are very welcome to open pull-requests and/or issues on the
>> GitHub tracker
>> (cf. http://scikit-learn.org/stable/developers/contributing.html#contributing-code)
>> The documentation issue should be straightforward.
>> The parameter renaming would need a proper deprecation cycle (cf
>> http://scikit-learn.org/stable/developers/contributing.html#deprecation).
>>
>> See you on GitHub,
>>
>> Tom
>>
>> 2018-05-23 11:50 GMT+02:00 Beaugnon Anael <anael.beaugnon at ssi.gouv.fr
>> <mailto:anael.beaugnon at ssi.gouv.fr>>:
>>
>>     Dear all,
>>
>>     Three clustering algorithms can take as input distance or
>>     similarity matrices instead of the observations
>>     (AgglomerativeClustering
>>     <http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering>,
>>     AffinityPropagation
>>     <http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AffinityPropagation.html#sklearn.cluster.AffinityPropagation>,
>>     and DBSCAN
>>     <http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html#sklearn.cluster.DBSCAN>),
>>     but there are inconsistencies in their documentations.
>>
>>
>>     *DBSCAN :*
>>        The documentation explains clearly how to run DBSCAN with a
>>     precomputed distance matrix.
>>        Constructor:/
>>            metric: If metric is “precomputed”, X is assumed to be a
>>     distance matrix and must be square.
>>     /
>>        fit / fit_predict /:
>>            X: A feature array, or array of distances between samples
>>     if |metric='precomputed'|.
>>
>>
>>     /
>>     *AffinityPropagation :
>>     *
>>         Constructor:
>>             affinity: /Which affinity to use. At the moment
>>     |precomputed| and |euclidean| are supported. |euclidean| uses the
>>     negative squared euclidean distance between points.
>>     /
>>         fit :  /
>>             X: //Data matrix or, if affinity is |precomputed|, matrix
>>     of similarities / affinities.
>>     /
>>         fit_predict :/
>>     /
>>     /        X: Input data.     /
>>             X can also be a matrix of similarities ? fit and
>>     fit_predict should share the same documentation for the input X ?/
>>
>>
>>     /
>>     *AgglomerativeClustering :
>>     *    Constructor:
>>             /affinity: Metric used to compute the linkage. Can be
>>     “euclidean”, “l1”, “l2”, “manhattan”, “cosine”, or ‘precomputed’.
>>     If linkage is “ward”, only “euclidean” is accepted/. 
>>             The name of the parameter 'affinity' seems misleading,
>>     since it does not correspond to similarity functions, but to
>>     distance functions.
>>         fit :  /
>>             X: //The samples a.k.a. observations./   
>>         fit_predict :/
>>     //        X: //Input data. 
>>     /        The documentation of fit and fit_predict does not
>>     specify that X can also be a matrix of distances.
>>
>>     The user may be confused whether he/she should provide a distance
>>     or a similarity matrix to AgglomerativeClustering.
>>     The documentation of fit and fit_predict can be easily updated.
>>     As for the name of the 'affinity' parameter, it is more difficult
>>     since it involves an API change.
>>
>>
>>     What do you think of these potential updates of the documentation ?
>>
>>     Cheers,
>>
>>     Anaël Beaugnon
>>     //
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180523/c49193a1/attachment-0001.html>


More information about the scikit-learn mailing list