Review Request of Python Code

Wed Mar 9 00:10:51 EST 2016

On Wednesday 09 March 2016 15:18, subhabangalore at gmail.com wrote:

> I am trying to copy the code here, for your kind review.
> 
> import MySQLdb
> import nltk
> def sql_connect_NewTest1():

This function says that it connects to the SQL database, but actually does 
much more. It does too much. Split your big function into small functions 
that do one thing each.

Your code has too many generic variable names like "var1" ("oh, this is 
variable number 1? how useful to know!") and too many commented out dead 
lines which make it hard to read. There are too many temporary variables 
that get used once, then never used again. You should give your variables 
names which explain what they are or what they are used for. You need to use 
better comments: explain *why* you do things, don't just write a comment 
that repeats what the code does:

dict_open = open(...) #OPENING THE DICTIONARY FILE 

That comment is useless. The code tells us that you are opening the 
dictionary file.

Because I don't completely understand what your code is trying to do, I 
cannot simplify the code or rewrite it very well. But I've tried. Try this, 
and see it it helps. If not, try simplifying the code some more, explain 
what it does better, and then we'll see if we can speed it up.

import MySQLdb
import nltk

def get_words(filename):
    """Return words from a dictionary file."""
    with open(filename, "r") as f:
        words = f.read().split()
    return words

def join_suffix(word, suffix):
    return word + "/" + suffix

def split_sentence(alist, size):
    """Split sentence (a list of words) into chunks of the given size."""
    return [alist[i:i+size] for i in range(0, len(alist), size)]

def process():
    db = MySQLdb.connect(host="localhost",
                     user="*****",
                     passwd="*****",
                     db="abcd_efgh")
    cur = db.cursor()
    cur.execute("SELECT * FROM newsinput limit 0,50;")
    dict_words = get_words("/python27/NewTotalTag.txt")
    words = []
    for row in cur.fetchall():
        lines = row[3].split(".")
        for line in lines:
            for word in line.split():
                if word in dict_words:
                    i = dict_words.index(word)
                    next_word =  dict_words[i + 1]
                else:
                    next_word = "NA"
                words.append(join_suffix(word, next_word))
    db.close()
    chunks = split_sentence(words, 7)
    for chunk in chunks:
        print chunk

-- 
Steve