Speeding up the implementation of Stochastic Gradient Ascent in Python

Thu Jan 18 18:17:27 EST 2018

On 17/01/18 14:29, leutrim.kaleci at gmail.com wrote:
> 
> Hello everyone, 
> 
> I am implementing a time-dependent Recommender System which applies BPR (Bayesian Personalized Ranking), where Stochastic Gradient Ascent is used to learn the parameters of the model. Such that, one iteration involves sampling randomly the quadruple (i.e. userID, positive_item, negative_item, epoch_index) for n times, where n is the total number of positive feedbacks (i.e. the number of ratings given to all items). But, as my implementation takes too much time for learning the parameters (since it requires 100 iterations to learn the parameters), I was wondering if there is a way to improve my code and speed up the learning process of the parameters.
> 
> Please find the code of my implementation (the update function named as updateFactors is the one that learns the parameters, and such that I guess is the one that should be improved in order to speed up the process of learning the parameter values) in the following link: 
> https://codereview.stackexchange.com/questions/183707/speeding-up-the-implementation-of-stochastic-gradient-ascent-in-python
> 

dim0 = []
dim1 = []
...
dim19 = []

dim0.append((asin, self.theta_item_per_bin[bin][itemID][0]))
dim1.append((asin,self.theta_item_per_bin[bin][itemID][1]))
...
dim19.append((asin,self.theta_item_per_bin[bin][itemID][19]))

for d in range(self.K2):
    if d == 0:
        max_value = max(dim0, key=operator.itemgetter(1))
        asin_ = max_value[0]
        value_ = max_value[1]
        print 'dim:',d,', itemID: ', asin_, ', value = ', value_

    if d == 1:
        max_value = max(dim1, key=operator.itemgetter(1))
        asin_ = max_value[0]
        value_ = max_value[1]
        print 'dim:',d,', itemID: ', asin_, ', value = ', value_

    ....

    if d == 19:
        max_value = max(dim19, key=operator.itemgetter(1))
        asin_ = max_value[0]
        value_ = max_value[1]
        print 'dim:',d,', itemID: ', asin_, ', value = ', value_

How about something like,

dims = [[] for _ in range(20)]

for i in range(20):
    dims[i].append((asin, self.theta_item_per_bin[bin][itemID][i]))

for j in range(self.K2):
    max_value = max(dims[j], key=operator.itemgetter(1))
    asin_ = max_value[0]
    value_ = max_value[1]
    print 'dim:', j,', itemID: ', asin_, ', value = ', value_

I haven't looked at it in detail, and I'm not sure why you have exactly
20 lists or what happens if self.K2 is not equal to 20. But you can make
the code simpler and shorter, and marginally more efficient by not
testing all those if statements.

Duncan