Can't match str/unicode

CM cmpython at gmail.com
Sat Jan 7 16:40:42 EST 2017


This is probably very simple but I get confused when it comes to encoding and am generally rusty. (What follows is in Python 2.7; I know.).

I'm scraping a Word docx using win32com and am just trying to do some matching rules to find certain paragraphs that, for testing purposes, equal the word 'match', which I know exists as its own "paragraph" in the target document. First, this is at the top of the file:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

Then this is the relevant code:

candidate_text = Paragraph.Range.Text.encode('utf-8')
print 'This is candidate_text:', candidate_text
print type(candidate_text)   
print type('match')
print candidate_text == 'match'
if candidate_text == 'match':
 #  do something...

And that section produces this:

This is candidate_text: match
<type 'str'>
<type 'str'>
False

and, of course, doesn't enter that "do something" loop since apparently candidate_text != 'match'...even though it seems like it does.

So what's going on here? Why isn't a string with the content 'match' equal to another string with the content 'match'?

I've also tried it with removing that .encode part and the encoding part at the very top, but then the candidate_text is a unicode object that I also can't get to match to anything.

What am I doing wrong? How should I approach this? Thanks.



More information about the Python-list mailing list