File reading using delimiters

Alan Kennedy alanmk at hotmail.com
Mon Jun 9 17:22:50 EDT 2003


Ben S wrote:

> In this case, I'm reading plain ASCII text files ranging in size from
> 10K to maybe 1Mb, where strings are delimited with the tilde character.
> But I'm asking as much about the available functionality as I am about
> my particular problem. I suppose I could use read() then split() and see
> how the performance works out. I'm surprised if there's nothing that
> lets me read more selectively from the file though.

I think this should solve the problem adequately. I had just started reading
through the codecs documentation, and needed a good example to work through
creating my own codec. Yours seemed like the ideal problem, so I wrote a codec,
the code for which I've pasted here. 

It should be easy to use, and, I do believe :-), an efficient way to read the
"lines" one at a time.

To other readers: This is my first stab at a codec, so I'd be grateful for any
pointers on anything that might be out of place, or more simply or efficiently
achieved.

#------------------------------------

import codecs, re

class TildeStreamReader(codecs.StreamReader):

    def readline(self):
        buf = [] ; c = self.stream.read(1)
        while c not in ['~', '']:
            buf.append(c)
            c = self.stream.read(1)
        return "".join(buf)

def tildelines_codec(name):
   if name != 'tildelines':
       return None
   def tildelines_decode(input, errors = None):
      return re.sub("\~", "\n", input), len(input)
   def tildelines_encode(input, errors = None):
      return re.sub("\n", "\~", input), len(input)
   return tildelines_encode, tildelines_decode, \
        TildeStreamReader, codecs.StreamWriter

codecs.register(tildelines_codec)                       

# Do it in one big lump

data = "0123456789~" * 10
data = data.decode('tildelines')
for line in data.splitlines():
    print "'%s'" % line

# Create a sample data file
data = "9876543210~" * 10
filename = "tildes.dat"
f = open(filename, "wt")
f.write(data)
f.close()

# And then go through it a line at a time

for ix, line in enumerate(codecs.open(filename, 'rt', 'tildelines')):
    print "Line %d: '%s'" % (ix, line)

#------------------------------------

And that's python 2.3, BTW.

HTH,

-- 
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan:              http://xhaus.com/mailto/alan




More information about the Python-list mailing list