unicode confusing

Pet petshmidt at googlemail.com
Tue May 26 04:09:37 EDT 2009


On May 26, 9:29 am, Pet <petshm... at googlemail.com> wrote:
> On May 25, 6:07 pm, Paul Boddie <p... at boddie.org.uk> wrote:
>
>
>
>
>
> > On 25 Mai, 17:39, someone <petshm... at googlemail.com> wrote:
>
> > > Hi,
>
> > > reading content of webpage (encoded in utf-8) with urllib2, I can't
> > > get parsed data into DB
>
> > > Exception:
>
> > >   File "/usr/lib/python2.5/site-packages/pyPgSQL/PgSQL.py", line 3111,
> > > in execute
> > >     raise OperationalError, msg
> > > libpq.OperationalError: ERROR:  invalid UTF-8 byte sequence detected
> > > near byte 0xe4
>
> > > I've already checked several python unicode tutorials, but I have no
> > > idea how to solve my problem.
>
> > With pyPgSQL, there are a few tricks that you have to take into
> > account:
>
> > 1. With PostgreSQL, it would appear advantageous to create databases
> > using the "-E unicode" option.
>
> Hi,
>
> DB is in UTF8
>
>
>
> > 2. When connecting, use the client_encoding and unicode_results
> > arguments for the connect function call:
>
> >   connection = PgSQL.connect(client_encoding="utf-8",
> > unicode_results=1)
>
> If I do unicode_results=1, then there are exceptions in other places,
> e.g. urllib.urlencode(values)
> cant encode values
>
>
>
> > 3. After connecting, it appears necessary to set the client encoding
> > explicitly:
>
> >   connection.cursor().execute("set client_encoding to unicode")
>
> I've tried this as well, but still have exceptions
>
>
>
> > I'd appreciate any suggestions which improve on the above, but what
> > this should allow you to do is to present Unicode objects to the
> > database and to receive such objects from queries. Whether you can
> > relax this and pass UTF-8-encoded strings instead of Unicode objects
> > is not something I can guarantee, but it's usually recommended that
> > you manipulate Unicode objects in your program where possible, and
> > here you should be able to let pyPgSQL deal with the encodings
> > preferred by the database.
>
> Thanks for your suggestions! Sadly, I can't solve my problem...
>
> Pet
>
>
>
> > Paul

After some time, I've tried, to convert result with unicode(result,
'ISO-8859-15') and that was it :)
I've thought it was already utf-8, because of charset defining in
<meta> of webpage I'm fetching
Pet



More information about the Python-list mailing list