howto remove the thousand separator

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sun Apr 14 21:14:25 EDT 2013


On Sun, 14 Apr 2013 17:44:28 -0700, Mark Janssen wrote:

> On Sun, Apr 14, 2013 at 5:29 PM, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> On Sun, 14 Apr 2013 12:06:12 -0700, Mark Janssen wrote:
>>
>>> cleaned=''
>>> for c in myStringNumber:
>>>    if c != ',':
>>>      cleaned+=c
>>> int(cleaned)
>>
>> ....due to being an O(N**2)  algorithm.
> 
> What on earth makes you think that is an O(n**2) algorithm and not O(n)?

Strings are immutable. Consider building up a single string from four 
substrings:

s = ''
s += 'fe'
s += 'fi'
s += 'fo'
s += 'fum'

Python *might* optimize the first concatenation, '' + 'fe', to just reuse 
'fe', (but it might not). Let's assume it does, so that no copying is 
needed. Then it gets to the second concatenation, and now it has to copy 
characters, because strings are immutable and cannot be modified in 
place. Showing the *running* total of characters copied:

'fe' + 'fi' => 'fefi'  # four characters copied
'fefi' + 'fo' => 'fefifo'  # 4 + 6 = ten characters copied
'fefifo' + 'fum' => 'fefifofum'  # 10 + 9 = nineteen characters copied

Notice how each intermediate substring gets copied repeatedly? In order 
to build up a string of length 9, we've had to copy at least 19 
characters. With only four substrings, it's not terribly obvious how 
badly this performs. So let's add some more substrings, and see how the 
running total increases:

'fefifofum' + 'foo' => 'fefifofumfoo'  # 19 + 12 = 31
'fefifofumfoo' + 'bar' => 'fefifofumfoobar'  # 31 + 15 = 46
'fefifofumfoobar' + 'baz' => 'fefifofumfoobarbaz'  # 46 + 18 = 64
'fefifofumfoobarbaz' + 'spam' => 'fefifofumfoobarbazspam'  # 64 + 22 = 86


To build up a string of length 22, we've had to copy, and re-copy, and re-
re-copy, 86 characters in total. And the string gets bigger, the 
inefficiency gets worse. Each substring (except the very last one) gets 
copied multiple times; the number of times it gets copied is proportional 
to the number of substrings.

If the substrings are individual characters, then each character is 
copied a number of times proportional to the number of characters N; 
since there are N characters, each being copied (proportional to) N 
times, that makes N*N or N**2.



-- 
Steven



More information about the Python-list mailing list