ascii character - removing chars from string

bruce bedouglas at earthlink.net
Tue Jul 4 11:09:53 EDT 2006


simon...

the issue that i'm seeing is not a result of simply using the
'string.replace' function. it appears that there's something else going on
in the text....

although i can see the nbsp in the file, the file is manipulated by a number
of other functions prior to me writing the information out to a file.
somewhere the 'nbsp' is changed, so there's something else going on...

however, the error i get indicates that the char 'u\xa0' is what's causing
the issue.. as far as i can determine, the string.replace can't/doesn't
handle non-ascii chars. i'm still looking for a way to search/replace
non-ascii chars...

this would/should resolve my issue..

-bruce


-----Original Message-----
From: python-list-bounces+bedouglas=earthlink.net at python.org
[mailto:python-list-bounces+bedouglas=earthlink.net at python.org]On Behalf
Of Simon Forman
Sent: Monday, July 03, 2006 11:28 PM
To: python-list at python.org
Subject: Re: ascii character - removing chars from string


bruce wrote:
> simon...
>
> the ' ' is not to be seen/viewed as text/ascii.. it's a
representation
> of a hex 'u\xa0' if i recall...

Did you not see this part of the post that you're replying to?

>  'nbsp': '\xa0',

My point was not that '\xa0' is an ascii character... It was that your
initial request was very misleading:

"i'm running into a problem where i'm seeing non-ascii chars in the
parsing i'm doing. in looking through various docs, i can't find
functions to remove/restrict strings to valid ascii chars."

That's why you got three different answers to the wrong question.

You weren't "seeing non-ascii chars" at all.  You were seeing ascii
representations of html entities that, in the case of ' ', happen
to represent non-ascii values.

>
> i'm looking to remove or replace the insances with a ' ' (space)

Simplicity:

s.replace(' ', ' ')

~Simon

"You keep using that word.  I do not think it means what you think it
means."
 -Inigo Montoya, "The Princess Bride"

>
> -bruce
>
>
> -----Original Message-----
> From: python-list-bounces+bedouglas=earthlink.net at python.org
> [mailto:python-list-bounces+bedouglas=earthlink.net at python.org]On Behalf
> Of Simon Forman
> Sent: Monday, July 03, 2006 7:17 PM
> To: python-list at python.org
> Subject: Re: ascii character - removing chars from string
>
>
> bruce wrote:
> > hi...
> >
> > update. i'm getting back html, and i'm getting strings like " foo
 "
> > which is valid HTML as the ' ' is a space.
>
> &, n, b, s, p, ;  Those are all ascii characters.
>
> > i need a way of stripping/removing the ' ' from the string
> >
> > the   needs to be treated as a single char...
> >
> >  text = "foo cat  "
> >
> >  ie ok_text = strip(text)
> >
> >  ok_text = "foo cat"
>
> Do you really want to remove those html entities?  Or would you rather
> convert them back into the actual text they represent?  Do you just
> want to deal with  's?  Or maybe the other possible entities that
> might appear also?
>
> Check out htmlentitydefs.entitydefs (see
> http://docs.python.org/lib/module-htmlentitydefs.html)  it's kind of
> ugly looking so maybe use pprint to print it:
>
> >>> import htmlentitydefs, pprint
> >>> pprint.pprint(htmlentitydefs.entitydefs)
> {'AElig': 'Æ',
>  'Aacute': 'Á',
>  'Acirc': 'Â',
> .
> .
> .
>  'nbsp': '\xa0',
> .
> .
> .
> etc...
>
>
> HTH,
> ~Simon
>
> "You keep using that word.  I do not think it means what you think it
> means."
>  -Inigo Montoya, "The Princess Bride"
>
> --
> http://mail.python.org/mailman/listinfo/python-list

--
http://mail.python.org/mailman/listinfo/python-list




More information about the Python-list mailing list