Script suddenly stops
Peter Otten
__peter__ at web.de
Fri May 30 03:26:57 EDT 2014
Chris wrote:
> Dear All,
>
> I'm trying to read ten 200 MB textfiles into a MySQL MyISAM database
> (Linux, ext4). The script output is suddenly stopping, while the Python
> process is still running (or should I say sleeping?). It's not in top,
> but in ps visible.
>
> Why is it stopping? Is there a way to make it continue, without calling
> "kill -9", deleting the processed lines and starting it again?
>
> Thank you in advance.
>
>
>
> [1] http://pastebin.com/CxHCA9eB
>
> #!/usr/bin/python
>
> import MySQLdb, pprint, re
> db = None
> daten = "/home/chris/temp/data/data/"
> host = "localhost"
> user = "data"
> passwd = "data"
> database = "data"
> table = "data"
>
> def connect_mysql():
> global db, host, user, passwd, database
> db = MySQLdb.connect(host, user, passwd, database)
> return(db)
>
>
> def read_file(srcfile):
> lines = []
> f = open(srcfile, 'r')
> while True:
> line = f.readline()
> #print line
> lines.append(line)
> if len(line) == 0:
> break
> return(lines)
The read_file() function looks suspicious. It uses a round-about way to read
the whole file into memory. Maybe your system is just swapping?
Throw read_file() away and instead iterate over the file directly (see
below).
> def write_db(anonid, query, querytime, itemrank, clickurl):
> global db, table
> print "write_db aufgerufen."
> cur = db.cursor()
> try:
> cur.execute("""INSERT INTO data
(anonid,query,querytime,itemrank,clickurl) VALUES (%s,%s,%s,%s,%s)""",
(anonid,query,querytime,itemrank,clickurl))
> db.commit()
> except:
> db.rollback()
>
>
> def split_line(line):
> print "split_line called."
> print "line is:", line
> searchObj = re.split(r'(\d*)\t(.*)\t([0-9: -]+)\t(\d*)\t([A-Za-
z0-9._:/ -]*)',line, re.I|re.U)
> return(searchObj)
>
>
>
> db = connect_mysql()
> pprint.pprint(db)
with open(daten + "test-07b.txt") as lines:
for line in lines:
result = split_line(line)
write_db(result[1], result[2], result[3], result[4], result[5])
> db.close()
Random remarks:
- A bare except is evil. You lose valuable information.
- A 'global' statement is only needed to rebind a module-global variable,
not to access such a variable. At first glance all your 'global'
declarations seem superfluous.
- You could change the signature of write_db() to accept result[1:6].
- Do you really need a new cursor for every write? Keep one around as a
global.
- You might try cur.executemany() to speed things up a bit.
More information about the Python-list
mailing list