compressing consecutive spaces

attn.steven.kuo at gmail.com attn.steven.kuo at gmail.com
Mon Jul 9 14:09:06 EDT 2007


On Jul 9, 7:38 am, Beliavsky <beliav... at aol.com> wrote:
> How can I replace multiple consecutive spaces in a file with a single
> character (usually a space, but maybe a comma if converting to a CSV
> file)? Ideally, the Python program would not compress consecutive
> spaces inside single or double quotes. An inelegant method is to
> repeatedly replace two consecutive spaces with one.



One can try mx.TextTools.  E.g.,


from mx.TextTools import *
import re

string_inside_quotes=re.compile(r'(?P<quote>["\']).*?(?<!\\)(?
P=quote)',
        re.MULTILINE)

def advance_position(text, position, len_text, sre):
    mobj = sre.match(text[position:])
    if mobj:
        incr = len(mobj.group(0))
    else:
        incr = 0
    return position + incr


table = ('try_again',
        ('quoted_string', CallArg,
            (advance_position, string_inside_quotes), +1,
'try_again'),
        ('nonspace', AllNotIn, ' ', +1, 'try_again'),
        ('space', AllIn, ' ', +1, 'try_again'),
        (None, EOF, Here, +1, MatchOk),
        (None, Fail, Here),)

for target_string in (
"     Try    using mx.TextTools 'for parsing    strings'",
"'It might    be' just what you needed",
'I find   "it    worthwhile"',
):
    print "BEFORE:%s" % target_string
    _, taglist, _ = tag(target_string, table)
    if taglist:
        tokens = []
        for t in taglist:
            tagobj, left_index, right_index = t[0:3]
            if tagobj == 'space':
                tokens.append(' ')
            else:
                tokens.append(target_string[left_index:right_index])
        print "AFTER:%s" % ''.join(tokens)
    else:
        print "Something went horribly wrong"


--
Hope this helps,
Steven




More information about the Python-list mailing list