memory consumption

Mon Mar 29 07:45:05 EDT 2021

Hello Alexej,

May I stupidly ask, why you care about that in general? Please don't get
me wrong I don't want to criticize you, this is rather meant to be a
(thought) provoking question.
Normally your OS-Kernel and the Python-Interpreter get along pretty well
and whenthere is free memory to be had, or not the necessity to release
allocated memory then why force this? Python will release memory when
needed by running the gc.

Have you tried running your task over all the data you have? Did it
crash your system or prevent other processes from having enough memory?
If not: why care?

I know that there can be (good) reasons to care, but as long as your
tasks run fine, without clogging your system, in my opinion there might
be nothing to worry about.

Cheers

Lars

Am 29.03.21 um 12:12 schrieb Alexey:
> Hello everyone!
> I'm experiencing problems with memory consumption.
>
> I have a class which is doing ETL job. What`s happening inside:
>  - fetching existing objects from DB via SQLAchemy
>  - iterate over raw data
>  - create new/update existing objects
>  - commit changes
>
> Before processing data I create internal cache(dictionary) and store all existing objects in it.
> Every 10000 items I do bulk insert and flush. At the end I run commit command.
>
> Problem. Before executing, my interpreter process weighs ~100Mb, after first run memory increases up to 500Mb
> and after second run it weighs 1Gb. If I will continue to run this class, memory wont increase, so I think
> it's not a memory leak, but rather Python wont release allocated memory back to OS. Maybe I'm wrong.
>
> What I tried after executing:
>  - gc.collect()
>  - created snapshots with tracemalloc and searched for some garbage, diff = 
>    smapshot_before_run - smapshot_after_run
>  - searched for links with "objgraph" library to internal cache(dictionary 
>    containing elements from DB)
>  - cleared the cache(dictionary)
>  - db.session.expire_all()
>
> This class is a periodic celery task. So when each worker executes this class at least two times,
> all celery workers need 1Gb of RAM. Before celery there was a cron script and this class was executed via API call
> and the problem was the same. So no matter how I run, interpreter consumes 1Gb of RAM after two runs.
>
> I see few solutions to this problem
> 1. Execute this class in separate process. But I had few errors when the same SQLAlchemy connection being shared
> between different processes.
> 2. Restart celery worker after executing this task by throwing exception.
> 3. Use separate queue for such tasks, but then worker will stay idle most of the time.
> All this is looks like a crutch. Do I have any other options ?
>
> I'm using:
> Python - 3.6.13
> Celery - 4.1.0
> Flask-RESTful - 0.3.6
> Flask-SQLAlchemy - 2.3.2
>
> Thanks in advance!

-- 
---
punkt.de GmbH
Lars Liedtke
.infrastructure

Kaiserallee 13a	
76133 Karlsruhe

Tel. +49 721 9109 500
https://infrastructure.punkt.de
info at punkt.de

AG Mannheim 108285
Geschäftsführer: Jürgen Egeling, Daniel Lienert, Fabian Stein