[Python-ideas] Move optional data out of pyc files

Fri Apr 13 18:11:55 EDT 2018

2018-04-11 2:03 GMT+02:00 Steven D'Aprano <steve at pearwood.info>:
[snip]

> I shouldn't think that the number of files on disk is very important,
> now that they're hidden away in the __pycache__ directory where they can
> be ignored by humans. Even venerable old FAT32 has a limit of 65,534
> files in a single folder, and 268,435,437 on the entire volume. So
> unless the std lib expands to 16000+ modules, the number of files in the
> __pycache__ directory ought to be well below that limit.
>
[snip]

Hi all,

Just for information for everyone:
(I was a VMS system manager  more than a decade ago, and I know that Win NT
(at least the core) is developed by a former VMS engineer. NTFS  is created
on the bases of Files-11 (Files-11B) file system. And in both file systems
the directory is a tree (in Files-11 it is a B-tree, maybe in NTFS it is
different tree, but tree). Holding the files ordered alphabetically.
And if there are "too much" files then accessing files will be slower.
(check for example the windows\system32 folder).

Of course it is not matter if there are some hundred or 1-2 thousand files.
But the too much matters.

I did a little measurement (intentionally not used functions not to make
the result wrong):

import os
import time

try:
    os.mkdir('tmp_thousands_of_files')
except:
    pass

name1 = 10001

start = time.time()
file_name = 'tmp_thousands_of_files/' + str(name1)
f = open(file_name, 'w')
f.write('aaa')
f.close()

stop = time.time()

file_time = stop-start

print(f'one file time {file_time} \n {start} \n {stop}')

for i in range(10002, 20000):
    file_name = 'tmp_thousands_of_files/' + str(i)
    f = open(file_name, 'w')
    f.write('aaa')
    f.close()

name2 = 10000

start = time.time()
file_name = 'tmp_thousands_of_files/' + str(name2)
f = open(file_name, 'w')
f.write('aaa')
f.close()

stop = time.time()

file_time = stop-start
print(f'after 10k, name before {file_time} \n {start} \n {stop}')

name3 = 20010

start = time.time()
file_name = 'tmp_thousands_of_files/' + str(name3)
f = open(file_name, 'w')
f.write('aaa')
f.close()

stop = time.time()

file_time = stop-start
print(f'after 10k, name after {file_time} \n {start} \n {stop}')

"""
result

c:\>python several_files_in_one_folder.py
one file time 0.0
 1523476699.5144918
 1523476699.5144918
after 10k, name before 0.015625953674316406
 1523476714.622918
 1523476714.6385438
after 10k, name after 0.0
 1523476714.6385438
 1523476714.6385438
"""

used: Python 3.6.1, windows 8.1, SSD drive

As you can see, when there an insertion into the beginning of the tree it
is much slower then adding to the end. (yes, I know the list insertion is
slow as well, but I saw VMS directory with 50k files, and the dir command
gave 5-10 files then waited some seconds before the next 5-10 files ... ;-)
)

BR,
George
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180414/e8e794ce/attachment.html>