[Python-ideas] Move optional data out of pyc files
George Fischhof
george at fischhof.hu
Fri Apr 13 18:11:55 EDT 2018
2018-04-11 2:03 GMT+02:00 Steven D'Aprano <steve at pearwood.info>:
[snip]
> I shouldn't think that the number of files on disk is very important,
> now that they're hidden away in the __pycache__ directory where they can
> be ignored by humans. Even venerable old FAT32 has a limit of 65,534
> files in a single folder, and 268,435,437 on the entire volume. So
> unless the std lib expands to 16000+ modules, the number of files in the
> __pycache__ directory ought to be well below that limit.
>
[snip]
Hi all,
Just for information for everyone:
(I was a VMS system manager more than a decade ago, and I know that Win NT
(at least the core) is developed by a former VMS engineer. NTFS is created
on the bases of Files-11 (Files-11B) file system. And in both file systems
the directory is a tree (in Files-11 it is a B-tree, maybe in NTFS it is
different tree, but tree). Holding the files ordered alphabetically.
And if there are "too much" files then accessing files will be slower.
(check for example the windows\system32 folder).
Of course it is not matter if there are some hundred or 1-2 thousand files.
But the too much matters.
I did a little measurement (intentionally not used functions not to make
the result wrong):
import os
import time
try:
os.mkdir('tmp_thousands_of_files')
except:
pass
name1 = 10001
start = time.time()
file_name = 'tmp_thousands_of_files/' + str(name1)
f = open(file_name, 'w')
f.write('aaa')
f.close()
stop = time.time()
file_time = stop-start
print(f'one file time {file_time} \n {start} \n {stop}')
for i in range(10002, 20000):
file_name = 'tmp_thousands_of_files/' + str(i)
f = open(file_name, 'w')
f.write('aaa')
f.close()
name2 = 10000
start = time.time()
file_name = 'tmp_thousands_of_files/' + str(name2)
f = open(file_name, 'w')
f.write('aaa')
f.close()
stop = time.time()
file_time = stop-start
print(f'after 10k, name before {file_time} \n {start} \n {stop}')
name3 = 20010
start = time.time()
file_name = 'tmp_thousands_of_files/' + str(name3)
f = open(file_name, 'w')
f.write('aaa')
f.close()
stop = time.time()
file_time = stop-start
print(f'after 10k, name after {file_time} \n {start} \n {stop}')
"""
result
c:\>python several_files_in_one_folder.py
one file time 0.0
1523476699.5144918
1523476699.5144918
after 10k, name before 0.015625953674316406
1523476714.622918
1523476714.6385438
after 10k, name after 0.0
1523476714.6385438
1523476714.6385438
"""
used: Python 3.6.1, windows 8.1, SSD drive
As you can see, when there an insertion into the beginning of the tree it
is much slower then adding to the end. (yes, I know the list insertion is
slow as well, but I saw VMS directory with 50k files, and the dir command
gave 5-10 files then waited some seconds before the next 5-10 files ... ;-)
)
BR,
George
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180414/e8e794ce/attachment.html>
More information about the Python-ideas
mailing list