NEWLINE character problem

Tim Peters tim.one at comcast.net
Fri Jan 30 14:11:28 EST 2004


[Tim, suggests
    return s.replace('\r\n', '\n').replace('\r', '\n')
]

[Dragos Chirila]
> Thanks a LOT !!
>
> I will give it a try and see how it works... the problem is that I
> have realy big strings (more than 50000 characters)

Why is that a problem?  A 50 megabyte string might be a strain on some
modern machines, but 50K isn't much anymore.

There are many ways to approach this, but I'm not clear on what you want to
do.  For example, you can easily enough split on \r, \n and \r\n "in one
step":

>>> import re
>>> bylines = re.compile(r'\r\n?|\n')
>>> bylines.split("abc\r\ndef\rghi\njkl\n")
['abc', 'def', 'ghi', 'jkl', '']  # watch out for the trailing ''!
>>>

Or if you want to pick off a line at a time, use a generator:

import re
bylines = re.compile(r'\r\n?|\n')

def genlines(s):
    i, n = 0, len(s)
    while i < n:
        m = bylines.search(s, i)
        if m:
            start, finish = m.span(0)
            yield s[i:start]
            i = finish
        else:
            break

and then

>>> for line in genlines("abc\r\ndef\rghi\njkl\n"):
...     print "<" + line + ">"
<abc>
<def>
<ghi>
<jkl>

Maybe you want a line end character on each piece too -- don't know, but it
should be easy to fiddle to taste.





More information about the Python-list mailing list