[Numpy-discussion] SParse feature vector generation

Samuel John scipy at samueljohn.de
Tue Jan 10 10:24:41 EST 2012


I would just use a lookup dict:

names = [ "uc_berkeley", "stanford", "uiuc", "google", "intel", "texas_instruments", "bool"]
lookup = dict( zip( range(len(names)), names ) )


Now, given you have n entries:

S = numpy.zeros( (n, len(names)) ,dtype=numpy.int32)

for k in ["uc_berkeley", "google", "bool"]:
    S[0,lookup[k]] += 1

for k in ["stanford", "intel","bool"]: 
    S[1,lookup[k]] += 1

... and so forth. so lookup[k] returns the index to use. 


Hope this helps. I am not aware of an automatic that does this. I may be wrong.
cheers, 
 Samuel


On 04.01.2012, at 07:25, Dhruvkaran Mehta wrote:

> Hi numpy users,
> 
> Is there a convenient way in numpy to go from "string" features like:
> 
> "uc_berkeley", "google", 1
> "stanford", "intel", 1
> .
> .
> .
> "uiuc", "texas_instruments", 0
> 
> to a numpy matrix like:
> 
>  "uc_berkeley", "stanford", ..., "uiuc", "google", "intel", "texas_instruments", "bool"
>           1                0         ...     0           1           0                0                       1
>           0                1         ...     0           0           1                0                       1 
>           :
>           0                0         ...     1           0           0                1                       0
> 
> I really appreciate you taking the time to help!
> Thanks!
> --Dhruv
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list