how to fast processing one million strings to remove quotes

Nick Mellor thebalancepro at gmail.com
Fri Aug 4 00:21:11 EDT 2017


Sorry Daiyue,

Try this correction: I'm writing code without being able to execute it.

 
> split_on_dbl_dbl_quote = original_list.join('|').split('""')
> remove_dbl_dbl_quotes_and_outer_quotes = split_on_dbl_dbl_quote[::2].join('').split('|')

split_on_dbl_dbl_quote = original_list.join('|').split('""')
remove_dbl_dbl_quotes_and_outer_quotes = '"'.join(split_on_dbl_dbl_quote[::2]).split('|')

Cheers,

Nick

> 
> You need to be sure of your data: [::2] (return just even-numbered elements) relies on all double-double-quotes both opening and closing within the same string.
> 
> This runs in under a second for a million strings but does affect *all* elements, not just strings. The non-strings would become strings after the second statement.
> 
> As to multi-processing: I would be looking at well-optimised single-thread solutions like split/join before I consider MP. If you can fit the problem to a split-join it'll be much simpler and more "pythonic".
> 
> Cheers,
> 
> Nick




More information about the Python-list mailing list