Performance of int/long in Python 3
rusi
rustompmody at gmail.com
Wed Apr 3 04:07:07 EDT 2013
On Apr 3, 12:37 pm, Neil Hodgson <nhodg... at iinet.net.au> wrote:
> Reran the programs taking a bit more care with the encoding of the
> file. This had no effect on the speeds. There are only a small amount of
> paths that don't fit into ASCII:
>
> ASCII 1076101
> Latin1 218
> BMP 113
> Astral 0
>
> # encoding:utf-8
> import codecs, os, time
> from os.path import join, getsize
> with codecs.open("filelist.txt", "r", "utf-8") as f:
> paths = f.read().split("\n")
> bucket = [0,0,0,0]
> for p in paths:
> b = 0
> maxChar = max([ord(ch) for ch in p])
> if maxChar >= 65536:
> b = 3
> elif maxChar >= 256:
> b = 2
> elif maxChar >= 128:
> b = 1
> bucket[b] = bucket[b] + 1
> print("ASCII", bucket[0])
> print("Latin1", bucket[1])
> print("BMP", bucket[2])
> print("Astral", bucket[3])
>
> Neil
Can you please try one more experiment Neil?
Knock off all non-ASCII strings (paths) from your dataset and try
again.
[It should take little more than converting your above code to a
filter:
if b == 0: print
if b > 0: ignore
]
More information about the Python-list
mailing list