unicode .replace not working - why?
Peter Otten
__peter__ at web.de
Sun Oct 12 04:00:26 EDT 2008
Kurt Peters wrote:
> I had done that about 21 revisions ago.
If you litter your module with code that is commented out it is hard to keep
track of what works and what doesn't.
> Nevertheless, why would you think
> that would work, when the code as shown doesn't?
Because he knows Python? Why don't /you/ try it before asking that question?
A good place to do "exploratory" programming is Python's interactive
interpreter. Here's a sample session:
Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:43)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from pyPdf import PdfFileReader as PFR
>>> doc = PFR(open("SUA.pdf"))
>>> text = doc.getPage(3).extractText()
>>> type(text)
<type 'unicode'>
>>> text[:200]
u'2/16/08 7400.8P Table of Contents - Continued Section Page
\ xa773.49 New Hampshire (NH) 50
\xa773.50 New Jersey (NJ) 50 \xa773.51 New Mex
ico (NM) 51 \xa773.52 New York (NY) 56 \xa773.53 North '
>>> print text[:200].replace(u"\xa7", u"\n")
2/16/08 7400.8P Table of Contents - Continued Section Page
73.49 New Hampshire (NH) 50
73.50 New Jersey (NJ) 50
73.51 New Mexico (NM) 51
73.52 New York (NY) 56
73.53 North
Peter
More information about the Python-list
mailing list