To re or not to re ... ( word wrap function?)

Chris Barker chrishbarker at home.net
Fri Sep 21 18:23:54 EDT 2001


Hi all,

I was just asked if it would be hard to write a script that would take
the output from an MS word: "save as text" operation, and re-format it
so it was wrapped to 80 character lines. I said it would be easy, then I
thought about it and realised it was not quite as trivial as I thought.
First I came up with a function that used a few string methods, but was
mostly "by hand". Then I tried an re version. It turned out to be not
much easier or shorter, though probably a little faster. I have not
benchmarked it, and frankly, speed is of little concern here: I'm after
elegance.

Anyway, I figured that:
A) someone else must have done this already

B) there should be a cleaner and more elegant way to do this.

C) I probably missed some special cases and havn't gotten it wuite right
anyway. (I have a littel more faith in the RE version, I did that
second, and thought of a few more special cases to handle.

Anyone have any suggestions?

note: this function just wraps a single line (or "paragraph" in
Word-speak), I would be part of a script that would do a whole file.

Here is the non-re version:

import string

def WordWrap(text,maxchar = 80):
    """

    A function that formats a single long line into lines that are a
    max of maxchar long.

    """

    if len(text) <= maxchar:
        return text

    new_text = []
    begin = 0
    end = maxchar+1 # allow an extra character, because if it's a space,
it will be removed.
    
    while end <= len(text):
        # first remove there is whitespace at the beginning:
        if text[end] in string.whitespace:
            new_text.append(text[begin:end].strip())
            begin = end + 1
            end += maxchar+1
        elif end == begin:# no whitespace at all
            new_text.append(text[begin:begin+maxchar].strip())
            begin += maxchar
            end += maxchar
        else:
            end -= 1
    else:
        new_text.append(text[begin:end].strip())
    return "\n".join(new_text)



# Here is the re version

def WordWrap2(text,maxchar = 80):
    """

    A function that formats a single long line into lines that are a
    max of maxchar long.

    """

    import re
    pattern = r"\s*(\S.{0,"+ `int(maxchar)`+r"})\s+"
    p = re.compile(pattern)

    new_text = []
    start = 0
    while start <=  len(text):
        match = p.match(text[start:])
        if match:
            #print match.groups()[0]
            if match.groups()[0]: # don't append if it's nothing but
whitespace
                new_text.append(match.groups()[0].strip())
            start += match.end()
        else: #"There is no whitespace in maxchar characters"
            new_text.append(text[start:start+maxchar].strip())
            start += maxchar

    return "\n".join(new_text)


thanks, 
-Chris

-- 
Christopher Barker,
Ph.D.                                                           
ChrisHBarker at home.net                 ---           ---           ---
http://members.home.net/barkerlohmann ---@@       -----@@       -----@@
                                   ------@@@     ------@@@     ------@@@
Oil Spill Modeling                ------   @    ------   @   ------   @
Water Resources Engineering       -------      ---------     --------    
Coastal and Fluvial Hydrodynamics --------------------------------------
------------------------------------------------------------------------



More information about the Python-list mailing list