Long running process - how to speed up?

Dennis Lee Bieber wlfraed at ix.netcom.com
Sat Feb 19 12:02:48 EST 2022


On Sat, 19 Feb 2022 11:28:31 +0000, Shaozhong SHI <shishaozhong at gmail.com>
declaimed the following:

>I have a cvs file of 932956 row and have to have time.sleep in a Python
>script.  It takes a long time to process.
>
	I'd echo the others... Unless you better explain WHY you have .sleep()
(along with how often it is called, and what duration you sleep) the first
recommendation would be to remove it.

	The most common justification for .sleep() is that one has CPU-BOUND
processing and needs to force context switches to let other operations
proceed more often than the system quantum time. Not normally a concern
given Python's GIL and the presence of multi-core chips.

	How are you processing the (near) million rows of that CSV? If you are
loading all of them into a large list you could be running Python list
reallocations, or OS page swapping (though I wouldn't expect that on most
modern systems -- maybe on a Raspberry-Pi/BeagleBone Black). Note:

>>> import sys
>>> sys.getsizeof("a")
50
>>> 

even a one-character string expands to 50 bytes. An EMPTY string takes up
51 bytes... Except for the empty string, that comes to about 49+<#chars> IF
all characters fit an 8-bit encoding -- if any non 8-bit characters are in
the string, the #chars needs to be multiplied by either 2 or 4 depending
upon the widest representation needed.

	If you are doing read-one-record, process-one-record, repeat -- and
have the .sleep() inside that loop... definitely remove the .sleep(). That
loop is already I/O bound, the fastest you can obtain is determined by how
rapidly the OS can transfer records from the file system to your program.


-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
	wlfraed at ix.netcom.com    http://wlfraed.microdiversity.freeddns.org/


More information about the Python-list mailing list