memory leak with re.match

Mayling ge maylinge0903 at gmail.com
Tue Jul 4 05:01:08 EDT 2017


   Hi,

   My function is in the following way to handle file line by line. There are
   multiple error patterns  defined and  need to apply  to each  line. I  use
   multiprocessing.Pool to handle the file in block.

   The memory usage increases to 2G for a 1G file. And stays in 2G even after
   the file processing. File closed in the end.

   If I comment  out the  call to re_pat.match,  memory usage  is normal  and
   keeps under 100Mb.

   am I using re in a wrong way? I cannot figure out a way to fix the  memory
   leak. And I googled .

   def line_match(lines, errors)

       for error in errors:

           try:

               re_pat = re.compile(error['pattern'])

           except Exception:

               print_error

               continue



           for line in lines:

               m = re_pat.match(line)

               # other code to handle matched object







   def process_large_file(fo):

       p = multiprocessing.Pool()

       while True:

           lines = list(itertools.islice(fo, line_per_proc))

           if not lines:

               break

           result = p.apply_async(line_match, args=(errors, lines))

   Notes: I  omit  some  code  as  I  think  the  significant  difference  is
   with/without re_pat.match(...)





   Regards,

   -Meiling



More information about the Python-list mailing list