Help on error " ValueError: For numerical factors, num_columns must be an int "
Josef Pktd
josef.pktd at gmail.com
Wed Dec 16 20:57:01 EST 2015
On Wednesday, December 16, 2015 at 9:50:35 AM UTC-5, Robert wrote:
> On Wednesday, December 16, 2015 at 6:34:21 AM UTC-5, Mark Lawrence wrote:
> > On 16/12/2015 10:44, Robert wrote:
> > > Hi,
> > >
> > > When I run the following code, there is an error:
> > >
> > > ValueError: For numerical factors, num_columns must be an int
> > >
> > >
> > > ================
> > > import numpy as np
> > > import pandas as pd
> > > from patsy import dmatrices
> > > from sklearn.linear_model import LogisticRegression
> > >
> > > X = [0.5,0.75,1.0,1.25,1.5,1.75,1.75,2.0,2.25,2.5,2.75,3.0,3.25,
> > > 3.5,4.0,4.25,4.5,4.75,5.0,5.5]
> > > y = [0,0,0,0,0,0,1,0,1,0,1,0,1,0,1,1,1,1,1,1]
> > >
> > > zipped = list(zip(X,y))
> > > df = pd.DataFrame(zipped,columns = ['study_hrs','p_or_f'])
> > >
> > > y, X = dmatrices('p_or_f ~ study_hrs', df, return_type="dataframe")
> > > =======================
> > >
> > > I have check 'df' is this type:
> > > =============
> > > type(df)
> > > Out[25]: pandas.core.frame.DataFrame
> > > =============
> > >
> > > I cannot figure out where the problem is. Can you help me?
> > > Thanks.
> > >
> > > Error message:
> > > ..........
> > >
> > >
> > > ---------------------------------------------------------------------------
> > > ValueError Traceback (most recent call last)
> > > C:\Users\rj\pyprj\stackoverflow_logisticregression0.py in <module>()
> > > 17 df = pd.DataFrame(zipped,columns = ['study_hrs','p_or_f'])
> > > 18
> > > ---> 19 y, X = dmatrices('p_or_f ~ study_hrs', df, return_type="dataframe")
> > > 20
> > > 21 y = np.ravel(y)
> > >
> > > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in dmatrices(formula_like, data, eval_env, NA_action, return_type)
> > > 295 eval_env = EvalEnvironment.capture(eval_env, reference=1)
> > > 296 (lhs, rhs) = _do_highlevel_design(formula_like, data, eval_env,
> > > --> 297 NA_action, return_type)
> > > 298 if lhs.shape[1] == 0:
> > > 299 raise PatsyError("model is missing required outcome variables")
> > >
> > > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in _do_highlevel_design(formula_like, data, eval_env, NA_action, return_type)
> > > 150 return iter([data])
> > > 151 design_infos = _try_incr_builders(formula_like, data_iter_maker, eval_env,
> > > --> 152 NA_action)
> > > 153 if design_infos is not None:
> > > 154 return build_design_matrices(design_infos, data,
> > >
> > > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in _try_incr_builders(formula_like, data_iter_maker, eval_env, NA_action)
> > > 55 data_iter_maker,
> > > 56 eval_env,
> > > ---> 57 NA_action)
> > > 58 else:
> > > 59 return None
> > >
> > > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\build.pyc in design_matrix_builders(termlists, data_iter_maker, eval_env, NA_action)
> > > 704 factor_states[factor],
> > > 705 num_columns=num_column_counts[factor],
> > > --> 706 categories=None)
> > > 707 else:
> > > 708 assert factor in cat_levels_contrasts
> > >
> > > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\design_info.pyc in __init__(self, factor, type, state, num_columns, categories)
> > > 86 if self.type == "numerical":
> > > 87 if not isinstance(num_columns, int):
> > > ---> 88 raise ValueError("For numerical factors, num_columns "
> > > 89 "must be an int")
> > > 90 if categories is not None:
> > >
> > > ValueError: For numerical factors, num_columns must be an int
> > >
> >
> > Slap the ValueError into a search engine and the first hit is
> > https://groups.google.com/forum/#!topic/pystatsmodels/KcSzNqDxv-Q
This was fixed in patsy 0.4.1 as discussed in this statsmodels thread.
You need to upgrade patsy from 0.4.0.
AFAIR, the type checking was too strict and broke with recent numpy versions.
Josef
> >
> > --
> > My fellow Pythonistas, ask not what our language can do for you, ask
> > what you can do for our language.
> >
> > Mark Lawrence
>
> Hi,
> I don't see a solution to my problem. I find the following demo code from
>
> https://patsy.readthedocs.org/en/v0.1.0/API-reference.html#patsy.dmatrix
>
> It doesn't work either on the Canopy. Does it work on your computer?
> Thanks,
>
> /////////////
> demo_data("a", "x", nlevels=3)
> Out[134]:
> {'a': ['a1', 'a2', 'a3', 'a1', 'a2', 'a3'],
> 'x': array([ 1.76405235, 0.40015721, 0.97873798, 2.2408932 , 1.86755799,
> -0.97727788])}
>
> mat = dmatrix("a + x", demo_data("a", "x", nlevels=3))
More information about the Python-list
mailing list