how to write add frequency in particular file by reading a csv file and then making a new file of multiple csv file by adding frequency

Peter Otten __peter__ at web.de
Fri Jun 23 12:13:25 EDT 2017


Dennis Lee Bieber wrote:

> On Fri, 23 Jun 2017 09:49:06 +0300, Jussi Piitulainen
> <jussi.piitulainen at helsinki.fi> declaimed the following:
> 
>>I just like those character translation methods, and I didn't like it
>>when you first took the time to call a simple regex "line noise" and
>>then proceeded to post something that looked much more noisy yourself.
>>
> 
> Tediously long (and likely slow running), but I'd think each .replace()
> would have been self-explanatory.
> 
>>I'm not sure I like the splitting of look-alike (I'm not sure that I
>>like not splitting it either) but note that the regex does that for
>>free.
>>
>>The \b in the original regex matches the empty string at a position
>>where there is a "word character" on only one side. It recognizes a
>>boundary at the beginning of a line and at whitespace, but also at all
>>the punctuation marks.
>>
>>You guess right about the length limits. I wouldn't use them, and then
>>there's no need for the boundary markers any more: my \w+ matches
>>maximal sequences of word characters (even in foreign languages like
>>Finnish or French, and even in upper case, also digits).
>>
>>To also match "people's" and "didn't", use \w+'\w+, and to match with
>>and without the ' make the trailing part optional \w+('\w+)? except the
>>notation really does start to become noisy because one must prevent the
>>parentheses from "capturing" the group:
>>
>>import re
>>wordy = re.compile(r'''  \w+  (?: ' \w+ )? ''', re.VERBOSE)
>>text = '''
>>Oliver N'Goma, dit Noli, né le 23 mars 1959 à Mayumba et mort le 7 juin
>>2010, est un chanteur et guitariste gabonais d'Afro-zouk.
>>'''
>>
>>print(wordy.findall(text))
>>
>># ['Oliver', "N'Goma", 'dit', 'Noli', 'né', 'le', '23', 'mars', '1959',
>># 'à', 'Mayumba', 'et', 'mort', 'le', '7', 'juin', '2010', 'est', 'un',
>># 'chanteur', 'et', 'guitariste', 'gabonais', "d'Afro", 'zouk']
>>
>>Not too bad?
> 
> Above content saved (in a write-only file? I don't recall the times
> I've searched my post archives) for potential future use. I should plug it
> into my demo and see how much speed improvement I get.

Most of the potential speedup can be gained from using collections.Counter() 
instead of the database. If necessary write the counter's contents into the 
database in a second step.




More information about the Python-list mailing list