Trouble with regexes

Wed May 25 12:29:29 EDT 2005

Fernando Rodriguez wrote:

> I'm trying to write a regex that matches a \r char if and only if it
> is not followed by a \n (I want to translate text files from unix
> newlines to windows\dos).

Unix uses \n and Windows uses \r\n, so matching lone \r isn't
going to help you the slighest... (read on)

> I tried this, but it doesn't work:
> p = re.compile(r'(\r)[^\n]', re.IGNORECASE)
>
> it still matches a string such as r'\r\n'

really?

>>> import re
>>> p = re.compile(r'(\r)[^\n]', re.IGNORECASE)
>>> print p.match('\r\n')
None
>>> print p.match(r'\r\n')
None

on the other hand,

<_sre.SRE_Match object at 0x0083B160>
>>> print p.match('\rx')
<_sre.SRE_Match object at 0x0083B120>
>>> print p.match(r'\rx')

it might be a good idea to play a little more with ''-literals and r''-
literals (and print x and print repr(x)) until you understand exactly
how things work...

:::

> I want to translate text files from unix newlines to windows\dos

you don't need regular expressions for that; the easiest way to
convert any kind of line endings to the local format is to open the
source file with the "U" flag:

    infile = open(filename, "rU") # universal line endings
    outfile = open(outfilename, "w") # text mode is default

    s = infile.readline()
    outfile.write(s)

:::

if you're converting files from Unix format to Windows format on a
Windows box, you don't have to do anything -- just open the files
in text mode, and Python's file I/O layer will fix the rest for you.

</F>