Regex Matching on Readline()

jwwest jwwest at gmail.com
Thu Dec 20 15:21:02 EST 2007


On Dec 20, 2:13 pm, John Machin <sjmac... at lexicon.net> wrote:
> On Dec 21, 6:50 am, jwwest <jww... at gmail.com> wrote:
>
>
>
> > Anyone have any trouble pattern matching on lines returned by
> > readline? Here's an example:
>
> > string = "Accounting - General"
> > pat = ".+\s-"
>
> > Should match on "Accounting -". However, if I read that string in from
> > a file it will not match. In fact, I can't get anything to match
> > except ".*".
>
> > I'm almost certain that it has something to do with the characters
> > that python returns from readline(). If I have this in a file:
>
> > Accounting - General
>
> > And do a:
>
> > line = f.readline()
> > print line
>
> > I get:
>
> > A c c o u n t i n g  -  G e n e r a l
>
> > Not sure why, I'm a nub at Python so any help is appreciated. They
> > look like spaces to me, but aren't (I've tried matching on spacs too)
>
> > - james
>
> To find out what the pseudo-spaces are, do this:
>
>     print repr(open("the_file", "rb").read()[:100])
>
> and show us (copy/paste) what you get.
>
> Also, tell us what platform you are running Python on, and how the
> file was created (by what software, on what platform).

Here's my output:
'A\x00c\x00c\x00o\x00u\x00n\x00t\x00i\x00n\x00g\x00 \x00-\x00 \x00G
\x00e\x00n\x00e\x00r\x00a\x00l\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00
\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00
\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00'

I'm running Python on Windows. The file was initially created as
output from SQL Management Studio. I've re-saved it using TextPad
which tells me it's Unicode and PC formatted.



More information about the Python-list mailing list