howto remove the thousand separator

Steven D'Aprano steve+comp.lang.python at pearwood.info
Mon Apr 15 03:03:15 EDT 2013


On Mon, 15 Apr 2013 03:19:43 +0100, Rotwang wrote:

> On 15/04/2013 02:14, Steven D'Aprano wrote:
>> On Sun, 14 Apr 2013 17:44:28 -0700, Mark Janssen wrote:
>>
>>> On Sun, Apr 14, 2013 at 5:29 PM, Steven D'Aprano
>>> <steve+comp.lang.python at pearwood.info> wrote:
>>>> On Sun, 14 Apr 2013 12:06:12 -0700, Mark Janssen wrote:
>>>>
>>>>> cleaned=''
>>>>> for c in myStringNumber:
>>>>>     if c != ',':
>>>>>       cleaned+=c
>>>>> int(cleaned)
>>>>
>>>> ....due to being an O(N**2)  algorithm.
>>>
>>> What on earth makes you think that is an O(n**2) algorithm and not
>>> O(n)?
>>
>> Strings are immutable. Consider building up a single string from four
>> substrings:
>>
>> s = ''
>> s += 'fe'
>> s += 'fi'
>> s += 'fo'
>> s += 'fum'
>>
>> Python *might* optimize the first concatenation, '' + 'fe', to just
>> reuse 'fe', (but it might not). Let's assume it does, so that no
>> copying is needed. Then it gets to the second concatenation, and now it
>> has to copy characters, because strings are immutable and cannot be
>> modified in place.
> 
> Actually, I believe that CPython is optimised to modify strings in place
> where possible, so that the above would surprisingly turn out to be
> O(n). See the following thread where I asked about this:

I deliberately didn't open that can of worms, mostly because I was in a 
hurry, but also because it's not an optimization you can rely on. It 
depends on the version, implementation, operating system, and the exact 
code running.

1) It only applies to code running under some, but not all, versions of 
CPython. It does not apply to PyPy, Jython, IronPython, and probably not 
other implementations.


2) Even under CPython, it can fail. It *will* fail if you have multiple 
references to the same strings. And it *may* fail depending on the 
vagaries of the memory management system in place, e.g. code that is 
optimized on Linux may fail to optimize under Windows, leading to slow 
code.


As far as I'm concerned, the best advice regarding this optimization is:

- always program as if it doesn't exist;

- but be glad it does when you're writing quick and dirty code in the 
interactive interpreter, where the convenience of string concatenation 
may be just too darn convenient to bother doing the right thing.



> http://groups.google.com/group/comp.lang.python/browse_thread/
thread/990a695fe2d85c52
> 
> (Sorry for linking to Google Groups. Does anyone know of a better c.l.p.
> web archive?)

The canonical (although possibly not the best) archive for c.l.p. is the 
python-list mailing list archive:

http://mail.python.org/mailman/listinfo/python-list


-- 
Steven



More information about the Python-list mailing list