Parsing strings -> numbers

Tuang tuanglen at hotmail.com
Tue Nov 25 15:43:47 EST 2003


Skip Montanaro <skip at pobox.com> wrote 
> 
> Be careful if you're scraping web pages which might not use the same charset
> as you do.  You may find something like:
> 
>     $123.456,78
> 
> as a quote price on a European website.  I don't know how to tell what the
> remote site used as its locale when formatting numeric data.  Perhaps
> knowing the charset of the page is sufficient to make an educated guess.

Thanks, Skip. I'm not planning some sort of shady screen scraping
operation or anything of that sort. This is more of a generic question
about how to use Python as a convenient utility language.

Sometimes I'll find a table of interesting data somewhere as I'm just
surfing around the Web, and I'll want to grab the data and play with
it a bit. At that scale of operation, I can just look at the page
source and figure out the encoding, what the currency is, etc. I know
how to turn a formatted string into a usable number in other languages
that I use (though I might have to check the docs in some cases to
remind myself of the details), and since the docs didn't really make
it obvious what the "one clear and obvious way to do it" was in
Python, I thought I'd ask.

It appears as though Python doesn't (yet) have the same formal support
for format parsing and internationalization that languages like C# and
Java have, but that's okay for now. I just wanted to make sure I
didn't start creating my own naive, homemade equivalents of functions
that are already part of the standard API.




More information about the Python-list mailing list