how to write add frequency in particular file by reading a csv file and then making a new file of multiple csv file by adding frequency

Jussi Piitulainen jussi.piitulainen at helsinki.fi
Thu Jun 22 15:46:28 EDT 2017


Dennis Lee Bieber writes:

>   # lowerecase all, open hyphenated and / separated words, parens,
>   # etc.
>   ln = ln.lower().replace("/", " ").replace("-", " ").replace(".", " ")
>   ln = ln.replace("\\", " ").replace("[", " ").replace("]", " ")
>   ln = ln.replace("{", " ").replace("}", " ")
>   wds = ln.replace("(", " ").replace(")", " ").replace("\t", " ").split()

A pair of methods, str.maketrans to make a translation table and then
.translate on every string, allows to do all that in one step:

spacy = r'\/-.[]{}()'
tr = str.maketrans(dict.fromkeys(spacy, ' '))

...

ln = ln.translate(tr)

But those seem to be only in Python 3.

>   # for each word in the line
>   for wd in wds:
>       # strip off leading/trailing punctuation
>       wd = wd.strip("\\|'\";'[]{},<>?~!@#$%^&*_+= ")

You have already replaced several of those characters with spaces.

>       # do we still have a word? Skip any with still embedded
>       # punctuation
>       if wd and wd.isalpha():
>           # attempt to update the count for this word

But for quick and dirty work I might use a very simple regex, probably
literally this regex:

wordy = re.compile(r'\w+')

...

for wd in wordy.findall(ln): # or .finditer, but I think it's newer
    ...


However, if the OP really is getting their input from a CSV file, they
shouldn't need methods like these. Because surely it's then already an
unambiguous list of words, to be read in with the csv module? Or else
it's not yet CSV at all after all? I think they need to sit down with
someone who can walk them through the whole exercise.



More information about the Python-list mailing list