slow loop?
maney at pobox.com
maney at pobox.com
Thu Jan 16 10:44:52 EST 2003
Brandon Beck <bbeck at nospam.austin.rr.com> wrote:
> Seems like you're trying to write a CSV parser. If so, I strongly
> suggest you get the Object Craft CSV module. It handles all of the
OTOH, if you want a Python implementation for basic CSV, here's what I
did after finding the one (several?) pure-Python version(s) listed in
the Vaults had glitches. The performance is good enough that on a
P2/233 that it doesn't annoy me when processing a file with about 7000
rather long lines (150 or so characters, two dozen or so fields). I
have nothing against fast C implementations, but sometimes you want to
be able to share the program with others without requiring them to muck
about with installing other stuff.
It's designed to be used in what seems like an obvious way despite my
previous work having used a parse that wanted to massage the entire
input file at once (to be fair, it was trying to guess the style of
CSV-like thing you had, and generally did a fair job of it):
import csv
for l in someFile.xreadlines():
fields = csv.split(l)
...
# $Id: csv.py,v 1.3 2003/01/13 21:00:28 maney Exp $
#import string
#import re
import exceptions
class CSVError(exceptions.Exception):
pass
def split(s):
"""\
Split the argument string s into a list of strings, one element for each
CSV-formatted field in s. This simple version recognizes only
comma-separated fields with double-quotes as optional delimiters; the
usual hack of using '""' within a quoted field as an escaped double-
quote is supported. The input string may include a terminating newline,
but it need not do so.
On error, a CSVError exception is raised. It carries three strings
with it: a description of the error, the portion of the input string
that has been processed successfully, and the unprocessed tail that
contains the error.
"""
res = []
i = 0
start = i
end = i
n = len(s)
if n > 0 and s[-1] == '\n':
n = n - 1
while 1:
#
# the current character is the start of a field; either a quoted field.
#
if i < n and s[i] == '"':
i += 1
start = i # start is first data char
end = -1 # end < start: not found ye
while i < n:
j = s.find('"', i)
if j < 0: # oops, no quote found
break
if j + 1 < n and s[j + 1] == '"': # doubled quote: pass it
i = j + 2
else: # must be the closing quote
i = j + 1
end = j
break
if end < start:
raise CSVError('ill-formed field: no closing quote', s[:start-1
field = '"'.join(s[start:end].split('""'))
#
# ... or an unquoted field
#
else:
start = i
j = s.find(',', i)
if j >= 0:
i = j
else:
i = n
field = s[start:i]
#
# append field to result list, see if there's another to parse
#
res.append(field)
if n <= i:
break
if s[i] == ',':
i += 1
elif i < n:
raise CSVError('ill-formed line: start of field not found', s[:i],
return res
More information about the Python-list
mailing list