unicode .replace not working - why?

Sun Oct 12 04:00:26 EDT 2008

Kurt Peters wrote:

> I had done that about 21 revisions ago.  

If you litter your module with code that is commented out it is hard to keep
track of what works and what doesn't.

> Nevertheless, why would you think 
> that would work, when the code as shown doesn't?

Because he knows Python? Why don't /you/ try it before asking that question?

A good place to do "exploratory" programming is Python's interactive
interpreter. Here's a sample session:

Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:43)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from pyPdf import PdfFileReader as PFR
>>> doc = PFR(open("SUA.pdf"))
>>> text = doc.getPage(3).extractText()
>>> type(text)
<type 'unicode'>
>>> text[:200]
u'2/16/08                7400.8P Table of Contents - Continued  Section Page  
\                                   xa773.49  New Hampshire (NH) 50
\xa773.50  New Jersey (NJ) 50 \xa773.51  New Mex                                  
ico (NM) 51 \xa773.52  New York (NY) 56 \xa773.53  North '
>>> print text[:200].replace(u"\xa7", u"\n")
2/16/08                7400.8P Table of Contents - Continued  Section Page
73.49  New Hampshire (NH) 50
73.50  New Jersey (NJ) 50
73.51  New Mexico (NM) 51
73.52  New York (NY) 56
73.53  North

Peter