Parsing strings -> numbers

Duncan Booth duncan at NOSPAMrcp.co.uk
Tue Nov 25 06:31:03 EST 2003


tuanglen at hotmail.com (Tuang) wrote in 
news:df045d93.0311250127.67395ae at posting.google.com:

>>>> locale.getdefaultlocale()
> ('en_US', 'cp1252')
>>>> locale.atoi("-12345")
> -12345
> 
> Given the locale it thinks I have, it should be able to parse
> "-12,345" if it can handle formats containing thousands separators,
> but apparently it can't.
> 
> If Python doesn't actually have its own parsing of formatted numbers,
> what's the preferred Python approach for taking taking data, perhaps
> formatted currencies such as "-$12,345.00" scraped off a Web page, and
> turning it into numerical data?
> 

The problem is that by default the numeric locale is not set up to parse 
those numbers. You have to set that up separately:

>>> import locale
>>> locale.getlocale(locale.LC_NUMERIC)
(None, None)
>>> locale.getlocale()
['English_United Kingdom', '1252']
>>> locale.setlocale(locale.LC_NUMERIC, "English")
'English_United States.1252'
>>> locale.atof('1,234')
1234.0
>>> locale.setlocale(locale.LC_NUMERIC, "French")
'French_France.1252'
>>> locale.atof('1,234')
1.234

Unless I've missed something, it doesn't support ignoring currency symbols 
when parsing numbers, so you still can't handle "-$12,345.00" even if you 
do set the numeric and monetary locales.

-- 
Duncan Booth                                             duncan at rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?




More information about the Python-list mailing list