Using dictionary key as a regular expression class

Chris Jones cjns1989 at gmail.com
Fri Jan 22 21:58:48 EST 2010


On Fri, Jan 22, 2010 at 08:46:35PM EST, Terry Reedy wrote:
> On 1/22/2010 4:47 PM, Chris Jones wrote:
>> I was writing a script that counts occurrences of characters in source code files:
>>
>> #!/usr/bin/python
>> import codecs
>> tcounters = {}
>> f = codecs.open('/home/gavron/git/screen/src/screen.c', 'r', "utf-8")
>> for uline in f:
>>    lline = []
>>    for char in uline[:-1]:
>>      lline += [char]
>
> Same but slower than lline.append(char), however, this loop just  
> uselessless copies uline[:1]

I'll change that. 

Do you mean I should just read the file one character at a time? 

That was my original intention but I didn't find the way to do it.

>>    counters = {}
>>    for i in set(lline):
>>      counters[i] = lline.count(i)
>
> slow way to do this
>
>>    for c in counters.keys():
>>      if c in tcounters:
>>        tcounters[c] += counters[c]
>>      else:
>>        tcounters.update({c: counters[c]})
>
> I do not see the reason for intermediate dict

Couldn't find a way to increment the counters in the 'grand total'
dictionary. I always ended up with the counter values for the last input
line :-(

Moot point if I can do a for loop reading one character at a time till
end of file.

>>    counters = {}
>
> duplicate line

And totally useless since I never reference it after that. Something I
move else where and forgot to delete. 

Sorry about that.

>> for c in tcounters.keys():
>>    print c, '\t', tcounters[c]

Literals, comments, €'s..?

> To only count ascii chars, as should be the case for C code,
>
> achars = [0]*63
> for c in open('xxx', 'c'):
>   try:
>     achars[ord(c)-32] += 1
>   except IndexError:
>     pass
>
> for i,n in enumerate(achars)
>   print chr(i), n
>
> or sum subsets as desired.

Thanks much for the snippet, let me play with it and see if I can come
up with a Unicode/utf-8 version.. since while I'm at it I might as well
write something a bit more general than C code.

Since utf-8 is backward-compatible with 7bit ASCII, this shouldn't be
a problem.

> Terry Jan Reedy

Thank you for your comments!

CJ



More information about the Python-list mailing list