memory consumption

Fri Apr 2 08:32:29 EDT 2021

> I had the (mis)pleasure of dealing with a multi-terabyte postgresql 
> instance many years ago and figuring out why random scripts were eating 
> up system memory became quite common. 
> 
> All of our "ETL" scripts were either written in Perl, Java, or Python 
> but the results were always the same, if a process grew to using 1gb of 
> memory (as your case), then it never "released" it back to the OS. What 
> this basically means is that your script at one time did in fact 
> use/need 1GB of memory. That becomes the "high watermark" and in most 
> cases usage will stay at that level. And if you think about it, it makes 
> sense. Your python program went through the trouble of requesting memory 
> space from the OS, it makes no sense for it to give it back to the OS as 
> if it needed 1GB in the past, it will probably need 1GB in the future so 
> you will just waste time with syscalls. Even the glibc docs state 
> calling free() does not necessarily mean that the OS will allocate the 
> "freed" memory back to the global memory space. 
> 
> There are basically two things you can try. First, try working in 
> smaller batch sizes. 10,000 is a lot, try 100. Second, as you hinted, 
> try moving the work to a separate process. The simple way to do this 
> would be to move away from modules that use threads and instead use 
> something that creates child processes with fork(). 

Thank you!

Decided to use separate process, because despite some improvements
and positive effects when it runs within Celery in production environment
there are still some significant overheads.

Another problem occurred with "Using Connection Pools with 
Multiprocessing or os.fork()" but I figured it out with 'Engine.dispose()' and
and 'pool_pre_ping'. Solutions can be found in official documentation for
SQLAlchemy.