Script suddenly stops

Peter Otten __peter__ at web.de
Fri May 30 03:26:57 EDT 2014


Chris wrote:

> Dear All,
> 
> I'm trying to read ten 200 MB textfiles into a MySQL MyISAM database
> (Linux, ext4). The script output is suddenly stopping, while the Python
> process is still running (or should I say sleeping?). It's not in top,
> but in ps visible.
> 
> Why is it stopping? Is there a way to make it continue, without calling
> "kill -9", deleting the processed lines and starting it again?
> 
> Thank you in advance.
> 
> 
> 
> [1] http://pastebin.com/CxHCA9eB
> 

> #!/usr/bin/python
>  
> import MySQLdb, pprint, re
> db = None
> daten = "/home/chris/temp/data/data/"
> host = "localhost"
> user = "data"
> passwd = "data"
> database = "data"
> table = "data"
>  
> def connect_mysql():
>     global db, host, user, passwd, database
>     db = MySQLdb.connect(host, user, passwd, database)
>     return(db)
>  
>  
> def read_file(srcfile):
>     lines = []
>     f = open(srcfile, 'r')
>     while True:
>         line = f.readline()
>         #print line
>         lines.append(line)
>         if len(line) == 0:
>             break
>     return(lines)

The read_file() function looks suspicious. It uses a round-about way to read 
the whole file into memory. Maybe your system is just swapping? 

Throw read_file() away and instead iterate over the file directly (see 
below).

> def write_db(anonid, query, querytime, itemrank, clickurl):
>     global db, table
>     print "write_db aufgerufen."
>     cur = db.cursor()
>     try:    
>         cur.execute("""INSERT INTO data 
(anonid,query,querytime,itemrank,clickurl) VALUES (%s,%s,%s,%s,%s)""",
(anonid,query,querytime,itemrank,clickurl))
>         db.commit()
>     except:
>         db.rollback()
>  
>  
> def split_line(line):
>     print "split_line called."
>     print "line is:", line
>     searchObj = re.split(r'(\d*)\t(.*)\t([0-9: -]+)\t(\d*)\t([A-Za-
z0-9._:/ -]*)',line, re.I|re.U)
>     return(searchObj)
>  
>  
>  
> db = connect_mysql()
> pprint.pprint(db)

with open(daten + "test-07b.txt") as lines:
    for line in lines:
        result = split_line(line)
        write_db(result[1], result[2], result[3], result[4], result[5])

> db.close()

Random remarks:

- A bare except is evil. You lose valuable information.
- A 'global' statement is only needed to rebind a module-global variable,
  not to access such a variable. At first glance all your 'global'
  declarations seem superfluous.
- You could change the signature of write_db() to accept result[1:6].
- Do you really need a new cursor for every write? Keep one around as a
  global.
- You might try cur.executemany() to speed things up a bit.




More information about the Python-list mailing list