To re or not to re ... ( word wrap function?)
Chris Barker
chrishbarker at home.net
Fri Sep 21 18:23:54 EDT 2001
Hi all,
I was just asked if it would be hard to write a script that would take
the output from an MS word: "save as text" operation, and re-format it
so it was wrapped to 80 character lines. I said it would be easy, then I
thought about it and realised it was not quite as trivial as I thought.
First I came up with a function that used a few string methods, but was
mostly "by hand". Then I tried an re version. It turned out to be not
much easier or shorter, though probably a little faster. I have not
benchmarked it, and frankly, speed is of little concern here: I'm after
elegance.
Anyway, I figured that:
A) someone else must have done this already
B) there should be a cleaner and more elegant way to do this.
C) I probably missed some special cases and havn't gotten it wuite right
anyway. (I have a littel more faith in the RE version, I did that
second, and thought of a few more special cases to handle.
Anyone have any suggestions?
note: this function just wraps a single line (or "paragraph" in
Word-speak), I would be part of a script that would do a whole file.
Here is the non-re version:
import string
def WordWrap(text,maxchar = 80):
"""
A function that formats a single long line into lines that are a
max of maxchar long.
"""
if len(text) <= maxchar:
return text
new_text = []
begin = 0
end = maxchar+1 # allow an extra character, because if it's a space,
it will be removed.
while end <= len(text):
# first remove there is whitespace at the beginning:
if text[end] in string.whitespace:
new_text.append(text[begin:end].strip())
begin = end + 1
end += maxchar+1
elif end == begin:# no whitespace at all
new_text.append(text[begin:begin+maxchar].strip())
begin += maxchar
end += maxchar
else:
end -= 1
else:
new_text.append(text[begin:end].strip())
return "\n".join(new_text)
# Here is the re version
def WordWrap2(text,maxchar = 80):
"""
A function that formats a single long line into lines that are a
max of maxchar long.
"""
import re
pattern = r"\s*(\S.{0,"+ `int(maxchar)`+r"})\s+"
p = re.compile(pattern)
new_text = []
start = 0
while start <= len(text):
match = p.match(text[start:])
if match:
#print match.groups()[0]
if match.groups()[0]: # don't append if it's nothing but
whitespace
new_text.append(match.groups()[0].strip())
start += match.end()
else: #"There is no whitespace in maxchar characters"
new_text.append(text[start:start+maxchar].strip())
start += maxchar
return "\n".join(new_text)
thanks,
-Chris
--
Christopher Barker,
Ph.D.
ChrisHBarker at home.net --- --- ---
http://members.home.net/barkerlohmann ---@@ -----@@ -----@@
------@@@ ------@@@ ------@@@
Oil Spill Modeling ------ @ ------ @ ------ @
Water Resources Engineering ------- --------- --------
Coastal and Fluvial Hydrodynamics --------------------------------------
------------------------------------------------------------------------
More information about the Python-list
mailing list