File reading using delimiters
Alan Kennedy
alanmk at hotmail.com
Mon Jun 9 17:22:50 EDT 2003
Ben S wrote:
> In this case, I'm reading plain ASCII text files ranging in size from
> 10K to maybe 1Mb, where strings are delimited with the tilde character.
> But I'm asking as much about the available functionality as I am about
> my particular problem. I suppose I could use read() then split() and see
> how the performance works out. I'm surprised if there's nothing that
> lets me read more selectively from the file though.
I think this should solve the problem adequately. I had just started reading
through the codecs documentation, and needed a good example to work through
creating my own codec. Yours seemed like the ideal problem, so I wrote a codec,
the code for which I've pasted here.
It should be easy to use, and, I do believe :-), an efficient way to read the
"lines" one at a time.
To other readers: This is my first stab at a codec, so I'd be grateful for any
pointers on anything that might be out of place, or more simply or efficiently
achieved.
#------------------------------------
import codecs, re
class TildeStreamReader(codecs.StreamReader):
def readline(self):
buf = [] ; c = self.stream.read(1)
while c not in ['~', '']:
buf.append(c)
c = self.stream.read(1)
return "".join(buf)
def tildelines_codec(name):
if name != 'tildelines':
return None
def tildelines_decode(input, errors = None):
return re.sub("\~", "\n", input), len(input)
def tildelines_encode(input, errors = None):
return re.sub("\n", "\~", input), len(input)
return tildelines_encode, tildelines_decode, \
TildeStreamReader, codecs.StreamWriter
codecs.register(tildelines_codec)
# Do it in one big lump
data = "0123456789~" * 10
data = data.decode('tildelines')
for line in data.splitlines():
print "'%s'" % line
# Create a sample data file
data = "9876543210~" * 10
filename = "tildes.dat"
f = open(filename, "wt")
f.write(data)
f.close()
# And then go through it a line at a time
for ix, line in enumerate(codecs.open(filename, 'rt', 'tildelines')):
print "Line %d: '%s'" % (ix, line)
#------------------------------------
And that's python 2.3, BTW.
HTH,
--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
More information about the Python-list
mailing list