RE Module Performance

Chris Angelico rosuav at gmail.com
Tue Jul 30 10:45:57 EDT 2013


On Tue, Jul 30, 2013 at 3:01 PM,  <wxjmfauth at gmail.com> wrote:
> I am pretty sure that once you have typed your 127504
> ascii characters, you are very happy the buffer of your
> editor does not waste time in reencoding the buffer as
> soon as you enter an €, the 125505th char. Sorry, I wanted
> to say z instead of euro, just to show that backspacing the
> last char and reentering a new char implies twice a reencoding.

You're still thinking that the editor's buffer is a Python string. As
I've shown earlier, this is a really bad idea, and that has nothing to
do with FSR/PEP 393. An immutable string is *horribly* inefficient at
this; if you want to keep concatenating onto a string, the recommended
method is a list of strings that gets join()d at the end, and the same
technique works well here. Here's a little demo class that could make
the basis for such a system:

class EditorBuffer:
	def __init__(self,fn):
		self.fn=fn
		self.buffer=[open(fn).read()]
	def insert(self,pos,char):
		if pos==0:
			# Special case: insertion at beginning of buffer
			if len(self.buffer[0])>1024: self.buffer.insert(0,char)
			else: self.buffer[0]=char+self.buffer[0]
			return
		for idx,part in enumerate(self.buffer):
			l=len(part)
			if pos>l:
				pos-=l
				continue
			if pos<l:
				# Cursor is somewhere inside this string
				splitme=self.buffer[idx]
				self.buffer[idx:idx+1]=splitme[:pos],splitme[pos:]
				l=pos
			# Cursor is now at the end of this string
			if l>1024: self.buffer[idx:idx+1]=self.buffer[idx],char
			else: self.buffer[idx]+=char
			return
		raise ValueError("Cannot insert past end of buffer")
	def __str__(self):
		return ''.join(self.buffer)
	def save(self):
		open(fn,"w").write(str(self))

It guarantees that inserts will never need to resize more than 1KB of
text. As a real basis for an editor, it still sucks, but it's purely
to prove this one point.

ChrisA



More information about the Python-list mailing list